Feature Extraction and Selection of Sentinel-1 Dual-Pol Data for Global-Scale Local Climate Zone Classification

The concept of the local climate zone (LCZ) has been recently proposed as a generic land-cover/land-use classification scheme. It divides urban regions into 17 categories based on compositions of man-made structures and natural landscapes. Although it was originally designed for temperature study, the morphological structure concealed in LCZs also reflects economic status and population distribution. To this end, global LCZ classification is of great value for worldwide studies on economy and population. Conventional classification approaches are usually successful for an individual city using optical remote sensing data. This paper, however, attempts for the first time to produce global LCZ classification maps using polarimetric synthetic aperture radar (PolSAR) data. Specifically, we first produce polarimetric features, local statistical features, texture features, and morphological features and compare them, with respect to their classification performance. Here, an ensemble classifier is investigated, which is trained and tested on already separated transcontinental cities. Considering the challenging global scope this work handles, we conclude the classification accuracy is not yet satisfactory. However, Sentinel-1 dual-Pol SAR data could contribute the classification for several LCZ classes. According to our feature studies, the combination of local statistical features and morphological features yields the best classification results with 61.8% overall accuracy (OA), which is 3% higher than the OA produced by the second best features combination. The 3% is considerably large for a global scale. Based on our feature importance analysis, features related to VH polarized data contributed the most to the eventual classification result.


Introduction
The local climate zone (LCZ) classification system is designed as a categorical scheme with 17 classes that describe urban landscapes [1,2]. These classes are defined based on surface structures and surface covers, which are specifically (1) the height of the surface structure, (2) spatial density of the surface structure, and (3) covering material of the surface (i.e., as shown in Figure 1). Eventually, the scheme yields thematic maps of a 100-m ground sampling distance (GSD), with each pixel labeled as one of the 17 classes. The LCZ was designed for the study of urban temperature behavior. It provides a research framework for urban heat island studies and standardizes the worldwide exchange of urban temperature observations [1]. As also shown in Figure 1, the scheme essentially demonstrates morphological structures of urban local neighborhood. The urban morphological structure is an influential factor on thermal behaviour. It may also reveal the economic status and the population distribution of a particular city. For example, slum districts, which are less economically developed regions with massive population concentration, normally appear as the seventh class in Figure 1. Therefore, thanks to the described morphological structure, the LCZ map acts as a valuable source for a wide variety of studies in urban areas. Following the introduction of LCZs, the World Urban Database and Portal (WUDAPT, http://www.wudapt.org) was initiated [3,4]. WUDAPT has been mainly developed by researchers to obtain high-quality land-cover/land-use information globally, usually via crowdsourcing [5,6], games [7], or other challenges [8]. The WUDAPT project presents a suggested workflow to produce the LCZ map by taking advantage of remote sensing techniques. The production process briefly functions as follows [3]: First, the region of interest (ROI) for a particular city is defined and labels of all classes are manually selected. Second, multispectral images captured by LandSat-8 are prepared for the ROI of the target city. Finally, a supervised classification (i.e., random forest [9]) is applied to the multispectral data to produce the final classification map. Besides the standard production of the WUDAPT project, studies of LCZ classification mainly focus on using optical remote sensing data [10][11][12].
One of the key factors to define LCZ classes is height. A few studies on LCZ classification were introduced to fuse the digital surface model (DSM) with the optical data in order to use both height and spectral information [13,14]. The Thematic Mapper (TM) data captured by LandSat-5 and Enhanced Thematic Mapper Plus (ETM+) data of LandSat-7 were fused with the normalized digital surface model (NDSM) and airborne Interferometric Synthetic Aperture Radar (InSAR) using feature concatenation and then applied to multiple classifiers for LCZ-related classification in [13]. In [14], LandSat-8 data and the digital surface model (DSM) were also concatenated at the feature level and then classified by a random forest and support vector machine [15]. It was concluded that spectral features can significantly contribute to the classification task. Geographical information system (GIS)-based approach is also developed to produce the LCZ map in [16][17][18]. Although Open Street Map (OSM) provides a free-accessible GIS dataset, the completeness of the GIS data needs to be improved, especially for the developing countries. Besides the aforementioned data sources, the Sentinel-1 mission provides a dual-polarimetric synthetic aperture radar (dual-Pol SAR) with free access and global coverage. It has also been studied for LCZ classification in [19]. Researchers have proven that the combination of Sentinel-1 dual-Pol data and LandSat-8 data can improve the performance of LCZ classification. However, they only take used of amplitudes of VV and VH channels and their corresponding texture features derived by gray-level co-occurrence matrix (GLCM). Amplitudes of the two channels only constitute one part of the polarimetric information provided by the dual-Pol data. One important feature, the coherence of the two channels, is missing. Therefore, the first goal of this work is to comprehensively investigate the polarimetric information of Sentinel-1 dual-Pol data for the LCZ classification task.
Global LCZ mapping offers substantial help in exploring local climates on regional and worldwide scales. Several studies have successfully produced LCZ classification maps corresponding to one city by only labeling samples of that city. In this manner, the produced classification maps have achieved high classification accuracy (e.g., overall accuracies (OAs) are beyond 80%) since both training and test samples are located in the same city. However, the collection of accurate training samples are either expensive or time-demanding. Therefore, in the remote sensing community, there is great interest in training models based on the samples available for some cities and applying the trained models to other cities. However, this is a challenging task. For example, one study [20] attempted to select training samples from one city for the classification of another city using RF. The classification accuracies dropped to 18.2%, which indicates that the knowledge transferability between different cities should be carefully considered. In this regard, our second goal is to develop a classifier with adequate generalization capability to be applied to any other cities. The difficulty in this task lies in that the classifier needs to be trained using a limited number of training samples while remaining applicable to handling transcultural, transnational, and cross-environmental data in a worldwide context.
To cope with the aforementioned challenge of generalization, the 2017 Geoscience and Remote Sensing Society (GRSS) data fusion contest of the year 2017 proposed training the classifier on five cities (Berlin, Hong Kong, Paris, Rome, and Sao Paulo) and testing the results on four other cities (Amsterdam, Chicago, Madrid, and Xi'an). Although deep learning-based classification methods have proven to be strong in terms of classification accuracy a generalization capability in the remote sensing community [21][22][23][24], the ensemble-based canonical correlation forest (CCF) classification strategy achieved the best performance in the contest, among more than 800 submissions [8,25]. Therefore, this work uses the CCF classifier to pursue a solution for our task. The CCF classifier is an advanced version of a random forest, which is a shallow classifier.
In contrast with the automatic feature selection and extraction of deep learning methods, the feature design is of key importance to shallow classifiers. From the perspective of feature space, especially for dual-Pol SAR data, a limited number of features is not adequate for a complicated classification like the LCZ task. An informative and appropriate number of features should be derived for the subsequent classification task. References [13,25] indicates that local statistical features are informative features regarding LCZ classifications. Texture features derived from GLCM have been proven to be informative for applications of SAR data [26][27][28][29][30]. Mathematical morphological features obtained from a morphological profile have been highly effective in multi/hyperspectral image classification [31][32][33]. Consequently, besides polarimetric features, we investigate the performances of local statistical features, GLCM features, and morphological profiles for LCZ classification on a global scale.
To sum up, the aims of the study are threefold for local climate zone classification. (1) Comprehensive polarimetric information of the Sentinel-1 dual-Pol data is investigated, which includes intensities of VV and VH channels as well as the coherence and relative phase of the two channels. (2) Classification on a global scale is studied by training and testing the CCF on the separated data of transcontinental cities, which involves terabytes of data volume. (3) Four features (polarimetric feature, local statistical feature, texture feature, and mathematical morphological feature) that were proven to be successful in related tasks are evaluated in our scenario of global-scale LCZ classification.
The rest of the paper is organized as follows. Section 2 demonstrates the principle of selecting a study area and describes the Sentinel-1 dual-Pol data and its data preparation. Section 3 introduces our methodology of global-scale LCZ classification. Section 4 discusses the experiment results regarding feature extraction and selection. Lastly, Section 5 concludes this work.

Study Areas
Our study aims to produce LCZ maps on a global scale and also focuses on cities of high population density. In total, 29 cities were selected and listed in Table 1. They are located on all continents except Antarctica, shown in Figure 2. This geographical distribution ensures that cities of interest include transcultural, transnational, and cross-environmental regions. While selecting these cities, population was another criteria under consideration. Among all cities, the population of each city was at least one million in 2016 and is expected to grow in the future according to UN statistics [34]. To enable our framework to solve the global challenge, cities of different regions were selected for both training and testing. The selection is shown in Figure 2 and Table 1.

Ground Truth
To produce LCZ classification maps in a supervised manner, we manually created the ground truth for the selected 29 cities. Generally, the labeling procedure followed the WUDAPT project standard procedure [4]. First, the region of interest (ROI) was decided for each selected city as a 50-by-50-kilometers rectangle centered at the city center. Within the rectangle, the ground truth, polygons of LCZ classes, were manually delineated by observing satellite images on Google Earth (https://www.google.com/earth/). Then, LandSat-8 images were prepared for the ROIs of each city. Afterwards, the software SAGA GIS (www.saga-gis.org/en/index.html), taking the delineated ground truth and LandSat-8 data as inputs, produced an LCZ classification map using the random forest classifier. The produced classification map is aim to be used as an additional validation source for checking the correctness and completeness of the ground truth data. By manually cross checking the classification map, images on Google Earth, and the delineated ground truth, the ground truth is modified if necessary, in the regard of correctness and completeness. Eventually, the delineated polygons of LCZ classes are created as the ground truth data, which are used for training and testing in this work. The delineated ground truth data are shown in Figure 3 with Sentinel-1 data as background.

Sentinel-1 Dual-Pol Data
The Sentinel-1 mission, as the SAR component of the European Copernicus program, has a constellation of two satellites each mounted with a C-band Synthetic Aperture Radar sensor. It has global coverage with a temporal resolution of six days. Additionally, the data is freely accessible.
The sensor collects data in four modes: (1) Stripmap (SM), (2) Interferometric Wide swath (IW), (3) Extra Wide swath (EW), and (4) Wave (WV). We used a level-1 product, which is focused single look complex data collected from the Interferometric Wide swath mode, because of its large coverage and availability. The level-1 Interferometric Wide swath SLC product consists of one image per sub-swath (three-sub-swaths, IW1, IW2, and IW3), per polarization channel (two polarization channels: VH and VV), resulting in six images in total. The properties of different swaths are shown in Table 2 while Table 3 shows the common properties of all sub-swaths.  Figure 4 presents the flowchart of data preparation for this work. It generally consists of two main parts-data downloading and data preprocessing-which are indicated as orange and blue blocks, respectively.

Data Preparation
Regarding data downloading, the Sentinel-1 data set is accessible to any users via the Copernicus Open Access Hub (https://scihub.copernicus.eu/) (also known as the Sentinels Scientific Data Hub). An open-source toolbox, named SentinelSat (https://github.com/sentinelsat/sentinelsat), provides the utilities of searching, downloading, and retrieving the metadata of Sentinel satellite images. An automatic data downloading tool was developed using SentinelSat, which functions based on an ROI file and a given time period of data collection.
For data processing, an ESA toolbox, the Sentinel Application Platform (SNAP, https://step.esa.int/main/toolboxes/snap/), was designed to work with data provided by Sentinel missions. The Sentinel-1 tool box [35] was integrated as a module to deal with all Sentinel-1 data products. The toolbox provides a powerful kit, named the graph processing tool (GPT), which is able to handle large data processing. Based on the GPT, an automatic Sentinel-1 data preprocessing chain was developed in our work so that data could be prepared for the classification task.  As shown in the flowchart, a series of data preprocessing modules are applied to Level-1 Sentinel-1 dual-Pol data products by GPT. The functionalities and configurations of these modules are explained in detail as follows: • Apply Orbit Profile: This module of preprocessing downloads the latest released orbit profile so that a precisely geocoded product can be achieved. • Radiometric Calibration: Radiometric calibration aims to convert the digital number of the pixel to a radiometrically calibrated backscatter, which is directly related to the radar backscatter of the scene. To extract the relative phase and the correlation between VV and VH, the product of calibration was chosen as a complex valued image. • TOPSAR Deburst: For each polarization channel, the Sentinel-1 IW product has three swaths. Each swath image consists of a series of bursts. TOPSAR Deburst merges all these bursts and swaths into a single SLC image. • Polarimetric Speckle Reduction: Speckle reduction was conducted by using the SNAP-integrated refined Lee filter, with a window size of seven by seven [36,37]. • Terrain Correction: Terrain correction eliminates the distortion introduced by the topographical variations. To accomplish the correction, the SRTM was used as the DEM to provide height information. The data was re-sampled to a 10-m GSD by the nearest-neighbor interpolation.
The data was geocoded into the WGS84/UTM coordinate system, in which the manually labeled ground truth data was coordinated, so that the ground truth data and Sentinel-1 data could be matched in terms of geo-location.
After the data preprocessing, the analysis-ready dual-Pol data is organized in the common PolSAR covariance matrix. The processed Sentinel-1 dual-Pol data of 29 cities are shown in Figure 3.

Methodology
The main building blocks of the proposed global-scale classification approach are feature extraction and classification, which will be detailed in the following subsections.

Feature Extraction
Following the data preparation described in Section 2, the Sentinel-1 dual-Pol data was processed to the commonly used PolSAR covariance matrix, with a size of two by two. Based on the above mentioned preprocessed dual-Pol data, four different types of features were derived. They were, namely, polarimetric, local statistical, texture, and mathematical morphological feature, which will be described in the following subsections: 3.1.1. Polarimetric SAR polarimetry allows for the retrieve of shape, orientation, and dielectric property information of scatterers [38,39]. Since there are multiple polarimetric channels, it provides more information than single-pol SAR data. However, the richness of polarimetry is achieved by sacrificing the spatial resolution. To balance the trade-off, instead of a fully polarized SAR, Sentinel-1 mission provides partially polarized SAR data, known as dual-Pol data, with the VV and VH channels. To use the polarimetric information of Sentinel-1 data, we used the intensity of the VH channel (|S V H | 2 ), intensity of VV channel (|S VV | 2 ), normalized coherence of VH and VV and relative phase of VH and VV (actan2( S V H S * VV )), where S VV and S V H are the complex signals of VV and VH channels, and * denotes complex conjugate. These four features contain essential polarimetric information provided by the dual-Pol data. This polarimetry combination is able to distinguish specular scattering from diffuse scattering [40]. For the purpose of LCZ classification, these features are highly beneficial to differ classes with different surface roughnesses, such as water, plant, building, and soil. We named them as Pol-Baseline in our experiments.

Local Statistical
Since the morphological structure of an urban neighborhood is one of the essential factors that local climate zone classes try to describe, it is natural to derive features describing the local neighborhood. It has been shown that simple statistical parameters of a local neighborhood the mean and standard deviation are suitable features for the classification in [13,25]. In our global-scale task, we extracted five statistical parameters: maximum, minimum, mean, standard deviation, and median of local patches. Since the ground sampling distance (GSD) of the LCZ map was suggested to be 100 ms [3,25], the local patch in this work was defined as a size of 10 by 10 pixels, corresponding to the suggested 100 meters GSD. Those parameters were derived from all four polarimetric features (Pol-Baseline), resulting in 20 features. We named the local statistical feature the Stat-Feature.

Texture
In general, SAR data is well known for containing texture information [27,29,41]. Dual-Pol SAR data is even richer in this regard, simply because it has one more channel. The GLCM is used here to extract texture features. The GLCM describes the distribution of co-occurring values of an image in a given area. It provides a statistical view of texture based on the image histogram [42]. This work extracts the GLCM-based texture information from Sentinel-1 dual-Pol data for LCZ classification. The GLCM statistical features used for describing the distributions include contrast, dissimilarity, homogeneity, angular second moment, maximum probability, entropy, mean, variance, and correlation. For more details of those features, please refer to [42]. Those features can characterize specific characteristics of images, such as homogeneity, contrast, and organized structures presented in an image. For the purpose of LCZ classification, the compactness relates to the spatial distribution of deterministic scatterers in a SAR image, which can be represented by GLCM features. Thus, the GLCM-based texture features are expected to be beneficial to classify LCZ classes with respect to compactness. To compute the GLCM, it was applied to intensity images of VV and VH channels. For the computational efficiency, the intensity value was quantized into 32 bins. The orientations 0 • , 45 • , 90 • and 135 • were chosen. A window size of 11 pixels was chosen. Since the ground sampling distance of the data was 10 ms, the window size suits the 100-m resolution definition of the local climate zone product. The ESA SNAP toolbox was used for GLCM extraction because of its fast implementation. The feature was named the GLCM-Feature.

Mathematical Morphological
Spatial-contextual information also plays an important role in LCZ classification. Morphological profiles are regarded as one of the most effective techniques for the extraction of spatial-contextual features and have been intensively used in the remote sensing community for information extraction and scene classification using optical data [33,[43][44][45] and SAR data [27]. Very recently, the advantage of using morphological profiles has become apparent in the application of LCZ classification [8].
The main building blocks of morphological profiles are opening and closing operation [46]. These morphological operators simplify the input gray scale image by removing structures with respect to a predefined structuring element. Structuring elements have a unique structure with a known shape and size (e.g., a disk with a radius of 5 pixels). Therefore, morphological profiles can be produced using a sequence of opening and closing operations, where a structuring element of increasing size applied to a gray scale image to accurately extract spatial features [44]. In [47], an advanced version of opening and closing (opening and closing by reconstruction) was introduced to further improve the ability of the conventional opening and closing operators in terms of information extraction and shape preservation. Opening and closing by reconstructions satisfy the following criterion: If the structuring element cannot fit the structure of the image (objects), then it will be totally removed, otherwise, it will be totally preserved. Reconstruction operators remove objects smaller than structuring element without altering the shape of those objects and reconstruct connected components from the preserved objects. Figure 5 shows an image captured over the city of Zurich along with its corresponding opening, opening by reconstruction, closing and closing by reconstruction. The structuring element has two important parameters: shape and size. Different objects with various surroundings have different degrees of response when considering morphological profiles under different structuring element sizes and shapes. Therefore, morphological profiles can extract the spatial features under different scales and geometric properties. In this paper, the profile was produced by considering the intensity images of VV and VH as inputs. The investigated structuring element in this paper is circular with the diameters set to 1, 3, and 5. We named the morphological feature the "MP-Feature".

Classifiers
A CCF [48] was chosen to pursue a solution to the task of global local climate zone classification in this work. There are two reasons for this selection: (1) CCF, as a member of random forest methods, is a non-parametric algorithm with a low computation cost, which also provides the function of feature importance analysis; (2) CCF was proven to not only outperform other classifiers in computer vision [48] but also as highly effective in local climate zone classification [25].
To recall a CCF, necessary notations are introduced as follows. Let X = [x 1 , x 2 , ..., x N ] T ∈ R N×P denote the training data, with N instances and P features. Let Y = [y 1 , y 2 , ..., y N ] T ∈ Z N×C be the label of training data, where C is the number of classes. If y i , a C × 1-sized vector, indicates that data instance x i belongs to class two, the second element of vector y i equals one, and other elements are zero. Accordingly, the training data set is represented as D = {X, Y}. Similarly, let X = [x 1 ,x 2 , ...,x N ] T ∈ R N×P denote the data for the prediction andŶ = [ŷ 1 ,ŷ 2 , ...,ŷ N ] T ∈ R N×C denote the predicted label. Differing from the label in the training data, all elements in predicted label, vectorŷ i , ranges in [0, 1], represent the probabilities of every class thatx i falls under.

Canonical Correlation Analysis (CCA)
Canonical correlation analysis was designed to analyze the linear relation between sets of variables [49]. Let two data sets be represented as W ∈ R k×a and V ∈ R k×b , with k instances and the numbers of features as a and b, respectively. CCA pursues canonical coefficients P = [p 1 , p 2 , ..., p v ] ∈ R a×v and Q = [q 1 , q 2 , ..., q v ] ∈ R b×v so that data sets W and V are linearly mapped into a latent space (WP ∈ R k×v and VQ ∈ R k×v ) where they have maximum correlation. Canonical coefficients are coupled in a pair-wise fashion {p i , q i } and given by (1).
These projected data sets WP and VQ are situated in a v dimensional space. The largest v correlation coefficients associate with the 1 st to v th dimension of the space. The solution of the optimization problem (1) is boiled down to a generalized eigenvalue problem [50].

CCFs
As a CCF is an advanced version of random forests, it is introduced by first presenting the random forest and then explaining the improvements made by CCF.
Let RF = {t i } i=1,...,L denote random forests, where a random forest RF consists of L decision trees t i . An individual decision tree recursively divides a feature space by axis-aligned split until the pure node or stop criteria is achieved. Since a decision tree is a deterministic classifier, training on the same data results in identical trees. To introduce randomness, there are two general strategies among random forest methods. The first one is bagging, where subsets of the whole training data (X sub ∈ R N s ×P , N s < N) are randomly sampled with replacements for training decision trees [51]. The second strategy is to use random subspaces of original data (X sub ∈ R N×P s , P s < P) to train decision trees [52]. Both strategies decorrelate decision trees and create diversity in predictions. Statistically, diversity encourages a random forest to follow the Law of Large Numbers so that it does not suffer from overfitting as individual decision tree [48,53,54]. Most importantly, the diversity is essential to the power of random forest. At the final prediction phase, an averaging of outputs of all trees is applied for the regression task (2).
For the classification task, besides the majority voting of a unique label, a random forest can also provide probabilities of each class that data instancex falls under (3).
Regarding the CCF, there are two key improvements [48]. First, instead of splitting the feature space in axis-aligned manner, it finds the hyperplane in the projected space, where input features X ∈ R N×P and training label Y ∈ Z N×C have maximum correlation. This enables CCF to find natural class boundaries in the feature space instead of restricting it to the axis. The second improvement is called projection bootstrapping. Instead of bagging the training data, it takes in all training data. However, it selects training samples exactly as bagging does to find the CCA projections. Then, all training data are mapped into the canonical correlation space to find hyperplanes. In this fashion, diversity is introduced by bagging-learned CCA projections.

Feature Importance Analysis
To better understand how different features work for our global-scale LCZ classification task, the function of feature importance analysis in a CCF is applied to gain a quantitative insight. Before the analysis, the principle of feature analysis is recalled in this section.
Once a CCF is trained, it predicts the out-of-bag sample or the validation sample. With the ground truth label, one could estimate a prediction error E pred . To achieve the importance analysis of feature P i , only values of feature P i are randomly permuted in samples. Therefore, the trained CCF predicts the label of samples with permuted P i and achieves another prediction error E perm . Hypothetically, E pred should be smaller than E perm . The feature importance indication of P i is given by , where the higher the value of this indication, the more important is feature P i is [53].

Experiments and Discussion
In this section, the performance of different feature types were analyzed. In this context, firstly, the polarimetric feature and local statistical feature were quantitatively compared, and the one with better performance was later treated as a benchmark feature. Secondly, the chosen benchmark feature was combined with the texture feature and morphological feature, respectively, allowing the performance of the latter two types of features to be compared and analyzed. Afterward the feature importance was analyzed using the CCF. Lastly, the performance of the Sentinel-1 dual-Pol data was discussed regarding the application of local climate zone classification. For all following experiments, the training was based on 20 cities and the testing was conducted on 9 cities, as listed in Table 1. The performance of the classification approach is evaluated based on three quantitative indicators, overall accuracy (OA), kappa coefficient, and average accuracy (AA). OA is simply achieved by dividing the total number of correctly classified samples by the number of overall classified samples, and reported in percentage. Kappa coefficient is a statistical measurement, which estimates inter-rater agreement and is of no unit. AA is the average of all class-specific accuracies.

Benchmark Feature Selection
The Pol-Feature is the identical preprocessed dual-Pol data organized at the pixel level with a ground sampling distance of 10 by 10 ms. The Stat-Feature derives straightforward statistical parameters out of a neighborhood of 100 by 100 ms. The extent is identical to the resolution of the targeted local climate zone classification map. Although the Stat-Feature involves spatial information, it essentially describes the polarimetric feature as well. Therefore, our first experiment was to analyze performances of both features. The better one, in terms of OA and kappa, was selected as the benchmark feature, which is later combined with the GLCM-Feature and the MP-Feature for further analysis.
According to Figure 6, the Stat-Feature outperformed the Pol-Feature by 5.68% and 0.088, in terms of OA and the kappa coefficient, respectively. It also performed better generally on producer accuracies. Considering the challenge of global scale in our work, the difference is quite dramatic, but not surprising. It has been proven in remote sensing that involving spatial information can significantly improve the classification performance [55][56][57]. Consequently, the Stat-Feature was selected as the benchmark feature to work with the other two features.

Texture Feature
The speckle is omnipresent in SAR images as an intrinsic characteristic [58], which is normally regarded as noise during SAR data interpretation. However, when extracting texture information, the speckle becomes a valuable source containing rich texture information. To quantitatively test to what extent speckle filtering impacts texture, the GLCM-Feature was extracted from data sets with (GLCM-Feature-F) and without (GLCM-Feature-UF) speckle filtering.
To summarize results in Figure 7, firstly, by comparing classification performances of GLCM-Feature-F and GLCM-Feature-UF, speckle filtering led to a massive loss of texture information. Secondly, the GLCM-Feature downgraded the classification performance of the Stat-Feature. The reason could be as follows. (1) Originating from the GLCM design, the radiometric resolution is decreased to 32 statistically equalized bins for computational efficiency, thus causing information loss. (2) The equalized bin is statistically decided in the individual data set. For a global scale task, data sets collected from different locations, at different times, with different incident angles would have very diverse intensity responses in imageries. Therefore, the method of GLCM texture extraction is unable to ensure that data sets with the same textures appear the same in the feature space.

Morphological Feature
To set up comprehensive experiments, like GLCM-Feature extraction, the MP-Feature was applied to data sets with MP-Feature-F and without MP-Feature-UF speckle filtering. Therefore, the performance of the MP-Feature could be tested regarding speckle SAR data.
According to results shown in Figure 8, MP-Feature improved the classification by 3% in OA. Furthermore there was almost no difference in the performance between MP-Feature-UF and MP-Feature-F.  Figure 9 shows the feature importance achieved by canonical correlation forests using all test samples. The empirical importance indication reveals several interesting phenomenons. Firstly, the most important three features are components of the MP-Feature. The morphological spatial feature plays a very important role in our global LCZ classification task since LCZ classes describe the morphological property of an urban local neighborhood. Secondly, the VH polarized data contributes the most to the classification because eight out of top ten important features (all top six), are related to VH data. Thirdly, the coherence of VH and VV data also contribute the classification; however, it has often been ignored when using PolSAR data. Lastly, features on the relative phase between VV and VH barely provide any information because they are all among the least important features.  Figure 9. Feature importance obtained by CCF. Table 4 shows class-specific accuracies of different feature combinations. Table 5 gives detailed information about the combinations. In the comparison of feature combinations, class-specific accuracies, OA, and kappa suggest that classification using the Stat-Feature and MP-Feature derived from filtered data provided the best classification accuracy. By comparing feature combinations A and B, the Stat-Feature improved classification accuracies significantly. Morphological description was also very important for our classification task because adding the MP-Feature to the Stat-Feature further improved classification accuracies, by comparing combinations B and E. Table 4. The producer accuracies, OA, and kappa coefficient of the CCF classification approach on different feature combinations (as detailed in Table 5). The number of training and test samples are listed for different classes. The best accuracy for each class is shown in a bold typeface. The metric OA and class-specific accuracy are reported in percentages, and the Kappa coefficient is unitless.  Table 5. Experiments settings regarding feature combinations.

Feature Name Feature Combination Code A B C D E F
Since the task is very challenging in aspects of global scale and complicated LCZ classes, with the CCF strategy, Sentinel-1 dual-Pol data were not able to provide satisfactory classification accuracies.
However, there was valuable information on how Sentinel-1 dual-Pol data could contribute to this task. Despite GLCM is not suitable on extracting textures from multiple cities, the texture features of Sentinel-1 dual-Pol data did excel in distinguishing compactness and height, because the combination C provided better accuracies for the first six classes compared to other combinations. By comparing the accuracies of bare soil or sand in combinations B and C, one can determine that a SAR speckle texture is a good option for classifying this specific class.

Sentinel-1 Data for LCZ Classification
According to Table 4 and Figure 10, it is obvious that the classification accuracy is not satisfactory. With reference to the proposed working flow shown in Figure 10, the first ten urban classes are very hard to be classified with the individual use of Sentinel-1 dual-Pol data. The two classes, compact high-rise and the compact mid-rise, are confused with each other. In addition, most urban classes are confused into the class of large low-rise. Eventually only accuracies for the water, low plants, and dense trees are acceptable. However, as it is the first study of LCZ classification using Sentinel-1 dual-Pol data, the present outcome leaves rooms for improvement. 6

Conclusions
This paper describes the first attempt to produce local climate zone classification maps on a global scale using freely accessible Sentinel-1 dual-Pol data sets. The set of features, which were reported as being informative in the literature were assessed for our specific goal. We discovered that the local statistical feature excelled in optical classification and also well with dual-Pol SAR data. However, the GLCM feature, which has often been reported as highly informative in SAR classification, was not suitable for our global scale task due to the lack of a generalization capability since its dependency on a statistical distribution of an individual data set limits its transferability to other data sets. The morphological profile functioned quite well in our task, improving local statistical features, texture features by 3% and 9.29% in terms of OA. Morphological profiles extracted the spatial morphological structure of the data, which suited the essential content of local climate zone classes. According to the feature importance analysis of the CCF classifier, the VH polarized data had the biggest contribution to our classification task. The often-ignored feature, the coherence of VV and VH, contributed more to the classification compared to contributions of the VV polarised data. Moreover, the relative phase of VV and VH polarized data barely provided informative content for classifications, even dragging down the performance.
By far, classification accuracies of the Sentinel-1 data are not very appealing. However, considering the challenges of a global scale and transferability in our task, Sentinel-1 data still contributes in different manners for certain classes. The ensemble CCF strategy in this paper was chosen for the classification step due to its high generalization ability and superior performance over deep learning-based classifiers [8]. One potential reason for why CCF outperformed a deep learning-based classification method in [8] may have been due to the limited number of training samples. However, one deficiency of the CCF is that it demands hand-designed extracted features to further boost its classification performance. While endless combinations of features that can be fed to a CCF exist, this paper took the very first steps to evaluate the most informative sets of features and critically compare their performance for a global LCZ classification.
In the future, we intend to further examine this challenging task by investigating a deep learning strategy in order to take advantage of its capability of automatic feature extraction and selection. No matter which strategy is under consideration, one more critical detail should also be studied is that, how large is the neighborhood in remote sensing data is optimal for classifying LCZ classes. Regarding the data source, we also intend to investigate using both radar and optical remote sensing data, so that data fusion might contribute to this task. Last but not the least, we are interested to develop a solution to our global scale task, which has a strong capability on transferring generalization.