Comparative Assessment of Machine Learning Methods for Urban Vegetation Mapping Using Multitemporal Sentinel-1 Imagery

: Mapping of green vegetation in urban areas using remote sensing techniques can be used as a tool for integrated spatial planning to deal with urban challenges. In this context, multitemporal (MT) synthetic aperture radar (SAR) data have not been equally investigated, as compared to optical satellite data. This research compared various machine learning methods using single-date and MT Sentinel-1 (S1) imagery. The research was focused on vegetation mapping in urban areas across Europe. Urban vegetation was classiﬁed using six classiﬁers—random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme learning machine (ELM). Whereas, SVM showed the best performance in the single-date image analysis, the MLP classiﬁer yielded the highest overall accuracy in the MT classiﬁcation scenario. Mean overall accuracy (OA) values for all machine learning methods increased from 57% to 77% with speckle ﬁltering. Using MT SAR data, i.e., three and ﬁve S1 imagery, an additional increase in the OA of 8.59% and 13.66% occurred, respectively. Additionally, using three and ﬁve S1 imagery for classiﬁcation, the F1 measure for forest and low vegetation land-cover class exceeded 90%. This research allowed us to conﬁrm the possibility of MT C-band SAR imagery for urban vegetation mapping.


Introduction
Remote sensing could provide reliable land-cover classification maps, through the active microwave and passive optical sensors, which could be used for a wide range of applications. The monitoring of urban vegetation at a regional scale has become an important topic, since urban development leads to a slow but steady degradation of urban green vegetation [1].
Urban areas are complex systems composed of numerous interacting components that evolve over multiple spatio-temporal scales [2]. In this context, a multispectral optical image is easy for interpretation and classification, but often climate conditions limit the utilization of this satellite imagery [3]. Conversely, synthetic aperture radar (SAR) systems are independent of weather and sun illumination and provide the all-weather mapping capability [4]. However, due to the coherent mode of backscattered signal processing [5], speckle noise cannot be avoided and will be present in SAR images [6]. The speckle noise degrades the quality of acquired imagery, causing difficulties for both manual and automatic image interpretation [7]. Therefore, speckle filtering is required for classification   listed in the Web of Science Core Collection containing the terms/topic "radar" or "scatteromet*" or "microwave*" or "SAR" for radar, and "optic*" or "Landsat" or "Sentinel-2" or "Sentinel-3" or "Quickbird" or "MODIS" or "IKONOS" or "GeoEye" or "WorldView" for optical imagery, refined by "land cover" or "land use". To extract a number of multitemporal related articles, the final results were refined by "multitemporal" or "multitemporal" or "multi temporal" or "time-series" or "time series".
Camargo et al. [46] and Lapini et al. [47] evaluated various classifiers using SAR imagery for LC classification of the Brazilian tropical savanna and forest classification in Mediterranean areas, respectively. Recent research presented RF, MLP, SVM, as the most accurate classifiers, and the DT J48 (DT-decision tree) classifier showed satisfactory performance for the detection of specific LC classes (e.g., vegetation). In contrast, in the latter study, RF achieved the best overall accuracy (OA), whereas SVM yielded a lower classification results due to the imbalanced number of samples among the classes. Waske and Braun [48] applied various classifier ensembles to MT C-band data for LC mapping. Classification accuracy of 84% was achieved in rural areas using RF classifier, which proved to be very well suited for LC classifications using MT stacks of SAR imagery.
The objective of this study was the mapping of vegetation in urban areas using MT C-band SAR imagery. Furthermore, this paper evaluated six different machine learning methods for classifying LC classes in three different study areas. The purpose of this research was to assess the possibility of vegetation mapping using MT S1 imagery in urban areas across Europe and on a related comparative assessment of different classifiers. The rest of the paper is organized as follows- (1) information about the study areas and SAR data used in this research; (2) description of pre-processing steps for S1 imagery and the tested classifiers for urban vegetation mapping; (3) results; (4) discussion, and (5) conclusions.   listed in the Web of Science Core Collection containing the terms/topic "radar" or "scatteromet*" or "microwave*" or "SAR" for radar, and "optic*" or "Landsat" or "Sentinel-2" or "Sentinel-3" or "Quickbird" or "MODIS" or "IKONOS" or "GeoEye" or "WorldView" for optical imagery, refined by "land cover" or "land use". To extract a number of multitemporal related articles, the final results were refined by "multitemporal" or "multi-temporal" or "multi temporal" or "time-series" or "time series".
Camargo et al. [46] and Lapini et al. [47] evaluated various classifiers using SAR imagery for LC classification of the Brazilian tropical savanna and forest classification in Mediterranean areas, respectively. Recent research presented RF, MLP, SVM, as the most accurate classifiers, and the DT J48 (DT-decision tree) classifier showed satisfactory performance for the detection of specific LC classes (e.g., vegetation). In contrast, in the latter study, RF achieved the best overall accuracy (OA), whereas SVM yielded a lower classification results due to the imbalanced number of samples among the classes. Waske and Braun [48] applied various classifier ensembles to MT C-band data for LC mapping. Classification accuracy of 84% was achieved in rural areas using RF classifier, which proved to be very well suited for LC classifications using MT stacks of SAR imagery.
The objective of this study was the mapping of vegetation in urban areas using MT C-band SAR imagery. Furthermore, this paper evaluated six different machine learning methods for classifying LC classes in three different study areas. The purpose of this research was to assess the possibility of vegetation mapping using MT S1 imagery in urban areas across Europe and on a related comparative assessment of different classifiers. The rest of the paper is organized as follows-(1) information about the study areas and SAR data used in this research; (2) description of pre-processing steps Remote Sens. 2020, 12, 1952 4 of 22 for S1 imagery and the tested classifiers for urban vegetation mapping; (3) results; (4) discussion, and (5) conclusions.

Study Areas
The study areas used in this research are shown in Figure 2. The first study area was Prague in the Czech Republic. The central part of the scene consisted of the urban area divided by the river Vltava. Most of the area in the south was agricultural land, either different types of crops or bare land, whereas the northern part of the scene was covered with forest, which separated the city from its outskirts. The second selected study area was Cologne, Germany. The western part was characterized by mainly flat terrain with agricultural fields and bare land areas, whereas eastern parts were dominated by forest areas. The central part of the scene was dominated by an urban area, with a lot of urban parks, lakes, and grasslands. Third, the considered study area was situated in Lyon, France. The city center with its surroundings was located in the western part of the scene, whereas other parts of the scene were dominated by vegetation and bare lands. Each study area had the same dimensions of approximately 30 km × 50 km, and the aforementioned areas were chosen because of a highly diverse landscape (more details about study areas are shown in Appendix A).

Study Areas
The study areas used in this research are shown in Figure 2. The first study area was Prague in the Czech Republic. The central part of the scene consisted of the urban area divided by the river Vltava. Most of the area in the south was agricultural land, either different types of crops or bare land, whereas the northern part of the scene was covered with forest, which separated the city from its outskirts. The second selected study area was Cologne, Germany. The western part was characterized by mainly flat terrain with agricultural fields and bare land areas, whereas eastern parts were dominated by forest areas. The central part of the scene was dominated by an urban area, with a lot of urban parks, lakes, and grasslands. Third, the considered study area was situated in Lyon, France. The city center with its surroundings was located in the western part of the scene, whereas other parts of the scene were dominated by vegetation and bare lands. Each study area had the same dimensions of approximately 30 km x 50 km, and the aforementioned areas were chosen because of a highly diverse landscape (more details about study areas are shown in Appendix A).

Data
The available S1 ground range detected (GRD) imagery with VV (vertical-vertical) and VH (vertical-horizontal) polarisations were acquired on the Sentinel Data Hub. For each study area, to ensure that the pixels remained unchanged in the same position over time, a reference date was chosen (i.e., 06 June 2019, 13 May 2019, and 04 June 2019 for Prague, Cologne, and Lyon, respectively). Since the constellation of Sentinel-1A (S1A) and Sentinel-1B (S1B) pass over the same spot on the ground every six days with identical orbit configuration (same image geometry), two scenes before and two scenes after the reference date (Table 1) in the same acquisition orbits were chosen for MT land-cover analysis (three scenes-MT_3; five scenes-MT_5).

Pre-Processing
To perform MT land-cover analysis using SAR imagery, several pre-processing steps are required. Pre-processing steps were executed with the Graph Processing Tools (GPT) of ESA's Toolbox (S1TBX). It included radiometric calibration, terrain correction, and co-registration.
For the quantitative usage of the S1 Level-1 imagery, a radiometric calibration needed to be applied. The result of the calibration was values that represented the radar backscatter of the reflecting surface. The calibration reversed the scaling factor applied during product generation and applied a constant offset and a range-dependent gain, including the absolute calibration offset. In this research, raw signals from the GRD products were calibrated to the sigma naught (σ 0 ) backscatter intensities.
GRD scenes have to be geocoded from a slant-range to a ground-range geometry, since the side-look view geometry of the SAR system, and Earth topography cause various distortions. Orthorectification of the S1 imagery (i.e., range doppler terrain correction operator) was conducted in the SNAP software, and the SAR scenes were terrain-corrected using the shuttle radar topography mission (SRTM) one-arcsecond tiles and were transformed to a universal transverse mercator (UTM) projection. The scenes were registered to a UTM Zone 33 N (Prague), Zone 32 N (Cologne), and Zone 31 N (Lyon), whereas WGS 1984 was used as an earth model.
In order to conduct LC classification in a time-series, image co-registration was needed to ensure that the images were spatially aligned. A set of images had to be aligned on a pixel scale, since wrong co-registration would produce incorrect LC mapping results [7]. For image registration, the used scene was reference dated for each study area as the master image, then the remaining images were registered to the base image.

Speckle Filtering
Prior to the land-cover classification of the S1 scene, speckle, appearing in SAR imagery as granular noise, needed to be filtered. For single speckle filtering in the spatial domain, many adaptive and non-adaptive filters were evaluated [49]. For this research, the Lee filter with a 5 × 5 window (Lee5) was used [10]. This filter assumed a Gaussian distribution for the noise and efficiently reduced speckle, while preserving the edges [50].
It should also be noted that the MT speckle filtering approach developed by Quegan and Yu [51] was tested in an experimental part of this research. The aforementioned filtering approach applied after the stacking of all scenes into one file. Using n co-registered images, the MT filter calculated n weighted averages while preserving the local mean backscatter in each image [52]. Since the MT speckle filter [51] did not produce a higher classification accuracy in comparison with the Lee5 spatial filter, a single pass of a spatial filter was applied to each scene. Similar results were reported in [3], which compared the performance of the spatial and MT filters using the MT SAR imagery. Although the MT filter could be used for deriving features in the spatial domain, the spatial speckle filter achieved a higher overall accuracy for classification applications.

Classification and Accuracy Assessment
After speckle filtering, performance evaluation of the land-cover classification was carried out using the six non-parametric machine learning methods. Prior to supervised pixel-based classification, reference polygon data were divided into the training data used to train the machine learning algorithms and validation data, in order to assess the accuracy of the LC classifications. The evaluated classifiers were random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme machine learning (ELM).
RF makes predictions by combining the results from many individual decision trees that were obtained by different subsets of the training data [53]. The main arguments that needed to be optimized were the number of decision trees to be combined (ntree) and the maximum number of features considered at each split (mtry). According to previous research by Noi and Kappas [54], ntree was 500, and the square root of the number of predictors was set for the mtry argument. Within R, the 'randomForest' package [55] was used for the RF classification.
For the SVM land-cover classification, we used the radial basis function (RBF), which takes predictor variables and applies a non-linear transformation to them [33,56]. The RBF kernel has two parameters that need to be set-the complexity coefficient C and the γ parameter, which is referred to as the kernel bandwidth. The optimal C parameter needed to be defined as a trade-off between error and margin, since the larger values lead to over-fitting and commonly require increasing computational time. The parameters mentioned above were investigated in depth for LC classification, using Sentinel-2 imagery in [54], and also in an experimental part of this research. Therefore, in order to reduce the computational time for the SVM classifier, C and γ were set to 1 within the 'kernlab' package [57].
XGB converts standard decision trees as weak learners into strong learners, using gradient boosting techniques. Developed by Chen and Guestrin [38], the boosting approach started with a high bias and then used the loss function to iteratively build trees that improve, compared to the errors of the prior trees. Some of the most important hyper-parameters within 'xgboost' package in R [58] that need to be optimized for XGB algorithm are the number of boosted trees (n_boost), and for over-fitting prevention-the learning rate (eta), tree complexity, and depth (max_depth), and a minimum sum of instance weight of all observations needed in a child (min_child_weight) [59]. Parameters n_boost, eta, max_depth, and min_child_weight was set to 100, 0.1, 6, and 1, respectively.
MLP consists of several layers of neurons that are fully connected with each other. The usual architecture of a model, which can separate nonlinear data, is the input layer, one or more hidden layers and the output layer [60]. Hyper-parameters of MLP include the number of hidden layers and the number of neurons in each layer (package 'keras' in R [61]). According to Heaton [62], two hidden layers were used since such a network can represent functions with any kind of shape, whereas the neuron numbers were set to 512 and 256. Backpropagation gives us detailed insights on how the weights and biases are learned at multiple layers within the network, in order to describe the overall behavior of the network [63].
From a collection of boosting ensemble methods for classification, Freund and Schapire's Adaboost.M1 (AB) [39] was chosen for the MT S1 imagery. The common goal of an AB classifier is to improve the accuracy by identifying weak learners based on the high weights and to create a strong classifier by boosting the ensemble method [64]. This research used the R package 'adabag' [64] for urban vegetation mapping, and both the number of iterations and the number of trees were set to 100.
The classification approach based on the extreme learning machine (ELM) classifier comprises a single-hidden layer in a feedforward neural network. The parameters of this learning algorithm (i.e., hidden nodes) were randomly chosen, and then the output weights of a hidden layer were computed [44]. Unlike the backpropagation neural network, for the ELM classifier, only the number of hidden nodes in the hidden layer needed to be optimized, and it was set to 1000, whereas the rectified linear unit was set as an activation function (package 'elmNNRcpp' in R [65]). The learning speed of ELM proved to be extremely fast, and one user-defined parameter could be easily optimized for the classification tasks [45].
According to the "good practice" recommendations defined by Olofsson et al. [66], the sampling design (detailed overview presented in [67]), response design, and analysis procedures are major components of the accuracy assessment methodology. To train and validate the LC classifications, a stratified random sample of 70% of the reference polygon data for training the machine learning methods and 30% of the reference polygon data for validating the accuracy of the results was used. The reference polygon data were collected by visual interpretation from a very high spatial resolution imagery (VHRSI) (e.g., WorldView-2/3, QuickBird) available via Google Earth and dated approximately the same as the S1 imagery [68,69]. Additionally, reference polygons were selected over the entire study area (approximately 30 km x 50 km) in such a way that there was no overlap between the training and testing sets (Table 2). Overall, the reference polygon area comprised approximately 4%, 3%, and 2% extent of the study area for Prague, Cologne, and Lyon, respectively. Independence between training and accuracy assessment polygon samples was assured by implementing a separate probability sample for accuracy assessment [70]. One of the challenges was to evaluate an agreement between the amount of training samples and their size for the LC classification. Valero et al. [71] reported that a smaller number of training data for the RF classifier produces lower classification accuracy results. On the other hand, the SVM classifier achieves very accurate results for even a small data set [72]. Additionally, during the training phase for the MLP, 10% of the training samples were selected as a validation data on which the loss function was evaluated at the end of each epoch [62].
An error or confusion matrix [70] compared the relationship between the reference and predicted data. Besides the overall accuracy (OA) and Kappa coefficient (K), the user's accuracy (UA) and the producer's accuracy (PA) were computed from the error matrix as an accuracy measure of individual Remote Sens. 2020, 12, 1952 8 of 22 LC classes [73]. Further, the F1 score [74], defined as the weighted harmonic mean of UA and PA was calculated using Equation (1). The performance of the urban vegetation classification was assessed using this measure. According to Sun et al. [75], the interpretation of the F1 score tended to be more relevant than the OA and K. The F1 score was calculated as follows: where PA is defined as the complement of the omission error probability, and UA is defined as the complement of the commission error probability. Besides using traditional methods for quantitative accuracy evaluation, e.g., the Kappa coefficient, which have certain limitations [76], another statistical LC method to determine accuracy, defined as the Figure of Merit (FoM), was calculated, as shown in Equation (2): where OA represents overall accuracy, O is the number of omissions, and C is the number of commissions.
To compare the performance of the machine learning methods, the same set of reference samples were used for accuracy assessment [77]. Since the reference data was not independent, the statistical significance of differences in accuracy between the two classification results was evaluated using the McNemar's Chi-squared test [78]. McNemar's test has been widely used for the comparison of classification results. It is based on a binary 2 × 2 contingency matrix, closely related to the χ 2 statistic, which could be adapted to compare multiple classifiers [79]: where f 12 and f 21 indicate the amount of correctly classified samples in classification map 1, but incorrectly in classification map 2, and vice versa. If the estimated χ 2 value was greater than 3.84 at a 95% confidence interval, the two classification methods would differ in their performances [60]. The accuracy assessment was conducted using the R programming language, version 3.6.0, through RStudio version 1.0.153.

Results
In order to assess the performance of the evaluated methods in different steps of the research (i.e., pre-processing of SAR imagery, number of input features), mean values of accuracy metrics for all three study areas were calculated. Table 3 shows OA, K for each machine learning method, as well as F1 and FoM for each land-cover class. Overall, the highest accuracy was achieved in the MT_5 scenario when the total number of input features was maximum. Using single-date imagery, the speckle filtering (VV_VH_SPK) scenario showed a better overall accuracy than the classification on the original S1 imagery (VV_VH). The Lee5 spatial filter reduced the speckle in the homogeneous areas and effectively preserved the edges and features, as shown in the research by Maghsoudi et al. [3] and Idol et al. [80]. In this part of the research (i.e., single-date imagery), an SVM classifier achieved the highest classification accuracy. When additional temporal S1 features were combined, the overall accuracy increased. All classifiers achieved better classification results in the MT_3 and MT_5 scenarios, except the ELM, whose accuracy decreased in the MT_3. Owing to the additional input data to train the model, the MLP classifier achieved the highest increase and overall the highest accuracy between LC classification, using single-date and MT imagery. Using MLP with multitemporal and multisource imagery, Kussul et al. [36] also outperformed commonly used machine learning methods for land-cover classification.  Figure 3 evaluates the performance of the tested machine learning methods. The SVM classifier performed better using the single-date S1 imagery, while the MLP performed better on the MT imagery when the number of input features was higher. If we compare boosting classifiers, AB performed better than XGB in the single-date classification scenario; conversely, XGB achieved better results in the MT scenario. The ELM classifier achieved the lowest classification results in this research. By introducing temporal information (i.e., five S1 imagery), the overall accuracy of all classifiers exceeded 90%, except for the ELM.  To assess the ability of differentiation between land-cover classes, omnibus measures (i.e., F1, FoM) that provide a single value were reported. However, along with these omnibus measures, Stehman and Foody [70] suggest reporting UA and PA, since their complementary measure (i.e., commission and omission error, respectively) are not interchangeable (Table 4 and Figure 4). As a stratified random sampling was chosen as a sampling design for this research, and LC classes were used as strata [66,70], UA and PA values for urban vegetation LC classes (i.e., forest and low vegetation) could be reported. In the VV_VH classification scenario, the MLP classifier yielded the To assess the ability of differentiation between land-cover classes, omnibus measures (i.e., F1, FoM) that provide a single value were reported. However, along with these omnibus measures, Stehman and Foody [70] suggest reporting UA and PA, since their complementary measure (i.e., commission and omission error, respectively) are not interchangeable (Table 4 and Figure 4). As a stratified random sampling was chosen as a sampling design for this research, and LC classes were used as strata [66,70], UA and PA values for urban vegetation LC classes (i.e., forest and low vegetation) could be reported. In the VV_VH classification scenario, the MLP classifier yielded the highest UA and PA value for the forest and low vegetation class, respectively. Conversely, the highest PA and UA value for the forest and low vegetation class was reached by the SVM classifier, respectively. In the VV_VH classification scenario, the MLP and SVM classifier correctly classified forest on the map that matched the ground truth data in terms of higher UA than PA, whereas MLP and SVM correctly identified more ground truth data as low vegetation, but the commission error (a complementary measure of UA) was much higher. After speckle filtering with Lee5 filter, SVM obtained the highest UA and PA values for forest and low vegetation class. When additional temporal S1 features were combined, the UA and PA values increased for individual LC classes. In the MT_3 and MT_5 classification scenarios, the forest and low vegetation class achieved the highest UA and PA values using the MLP classifier, and their values exceeded 90%. and SVM correctly identified more ground truth data as low vegetation, but the commission error (a complementary measure of UA) was much higher. After speckle filtering with Lee5 filter, SVM obtained the highest UA and PA values for forest and low vegetation class. When additional temporal S1 features were combined, the UA and PA values increased for individual LC classes. In the MT_3 and MT_5 classification scenarios, the forest and low vegetation class achieved the highest UA and PA values using the MLP classifier, and their values exceeded 90%.  Since it is possible to obtain higher classification accuracy using an imbalanced data sets [81], macro-averaged measures (i.e., F1, FoM, UA) were used for multi-class problems because it treats all Since it is possible to obtain higher classification accuracy using an imbalanced data sets [81], macro-averaged measures (i.e., F1, FoM, UA) were used for multi-class problems because it treats all classes equally [82]. A row-wise normalization was made within each confusion matrix [83], establishing a direct comparability between matrices in the study areas of different-sized sample populations [84] (Figure 4). Elements on the main diagonal inform us how well the map represents what is really on the ground, whereas off-diagonal elements are committed (i.e., false positive error) to other land-cover classes. Therefore, Figure 4 shows an increase in the UA for different classification scenarios of this research, and with respect to the machine learning method used. LC classification using original VV and VH polarisation data shows much noise in the final results. In Prague, many areas were omitted from the correct forest category to bare soil or water class, whereas in Cologne, the lowest UA of the low vegetation class was caused by the confusion with forest, and in Lyon, built-up areas were confused with low vegetation. Commission errors decrease with speckle filtering, but still, some misclassifications using single-date imagery remain (e.g., low vegetation with forest, built-up with low vegetation), which could be improved by using MT SAR data [85]. In the MT part of the research, UA for several land-cover classes significantly improved with additional temporal S1 features. Bare land and forest classes remained with high UA values, whereas built-up areas showed some confusion with forest class. Surprisingly, a large number of forest areas were classified as a water class in Prague, although confusion between water surfaces and forests does not usually occur on SAR imagery [23,24]. At a closer visual examination of the Prague classification map and according to the historical meteorological data [86], this could be due to the rainfall event that occurred during periods of acquired imagery for two S1 imagery (i.e., 06th June and 12th June 2019). This misclassification led to an overestimation of the water category. Through the change in the medium's dielectric constant, soil moisture had a major effect on the backscatter magnitude in terms of its increase up to 5 dB [87]. S1 MT imagery improved the classification of the low vegetation (i.e., grassland, shrubs) class, which reduced commission error with the forest and the built-up class. Figure 5 shows mean values for all machine learning methods evaluated in this research, with respect to the different classification scenarios. In the single-date S1 image analysis, an improvement of 20% and 0.24, in terms of the OA and Kappa values was achieved with speckle filtering. Further increase in the OA of 8.59% and 13.66% occurred with the use of three and five S1 imagery for LC classification, respectively. confusion between water surfaces and forests does not usually occur on SAR imagery [23,24]. At a closer visual examination of the Prague classification map and according to the historical meteorological data [86], this could be due to the rainfall event that occurred during periods of acquired imagery for two S1 imagery (i.e., 06th June and 12th June 2019). This misclassification led to an overestimation of the water category. Through the change in the medium's dielectric constant, soil moisture had a major effect on the backscatter magnitude in terms of its increase up to 5 dB [87]. S1 MT imagery improved the classification of the low vegetation (i.e., grassland, shrubs) class, which reduced commission error with the forest and the built-up class. Figure 5 shows mean values for all machine learning methods evaluated in this research, with respect to the different classification scenarios. In the single-date S1 image analysis, an improvement of 20% and 0.24, in terms of the OA and Kappa values was achieved with speckle filtering. Further increase in the OA of 8.59% and 13.66% occurred with the use of three and five S1 imagery for LC classification, respectively. In this research, the possibility of urban vegetation mapping was assessed by using various machine learning methods. In single-date image analysis, the SVM classifier achieved higher accuracy results than other classifiers ( Figure 3) and the potential for detecting vegetation in built-up areas ( Figure 6). In the MT classification scenario, when additional temporal information was introduced, MLP outperformed other classifiers. Therefore, Figure 6 shows a subset (2 km × 2 km) of each study area, with examples of built-up areas with surrounding urban vegetation (e.g., parks, urban gardens). Accuracy assessment was made over the entire study area (approx. 30 km × 50 km). These example subsets ( Figure 6) were chosen to demonstrate the possibility of vegetation mapping in complex systems, such as urban environments, in which mixed pixels pose the greatest challenge In this research, the possibility of urban vegetation mapping was assessed by using various machine learning methods. In single-date image analysis, the SVM classifier achieved higher accuracy results than other classifiers ( Figure 3) and the potential for detecting vegetation in built-up areas ( Figure 6). In the MT classification scenario, when additional temporal information was introduced, MLP outperformed other classifiers. Therefore, Figure 6 shows a subset (2 km × 2 km) of each study area, with examples of built-up areas with surrounding urban vegetation (e.g., parks, urban gardens). Accuracy assessment was made over the entire study area (approx. 30 km × 50 km). These example subsets ( Figure 6) were chosen to demonstrate the possibility of vegetation mapping in complex systems, such as urban environments, in which mixed pixels pose the greatest challenge (e.g., underestimation of the water class owing to the mixed pixels that have a subpixel land presence, as noted in [88]).
In this research, the SVM and MLP classifier achieved the highest OA and K ( Figure 3) for urban vegetation mapping in the single-date (i.e., VV_VH, VV_VH_SPK), and in the MT (i.e., MT_3, MT_5) classification scenario, respectively. Therefore, McNemar's χ 2 test was statistically used to compare the classification results achieved by SVM and MLP against other classifiers for each study area (Table 5). SVM is less often wrong than any other classifier in the single-date image analysis. However, it should be noted that in some classification scenarios, SVM and AB perform very similarly. This is shown in Table 5, as the χ 2 value indicates that two classifiers perform equally well with a probability of at least 95%. Using the MT SAR imagery, in Prague and Cologne, MLP achieved statistically different results from those produced by other classifiers. In Lyon, MLP yielded comparable classifications results to other classification methods, except for the ELM classifier. (e.g., underestimation of the water class owing to the mixed pixels that have a subpixel land presence, as noted in [88]). Figure 6. Example subset of each study area shown as Sentinel-2 "true-color" composite (left); classification map using single-date S1 imagery and support vector machine (SVM) classifier (middle); classification map using multitemporal imagery (five scenes) and multi-layer perceptron (MLP) classifier (right).
In this research, the SVM and MLP classifier achieved the highest OA and K (Figure 3) for urban vegetation mapping in the single-date (i.e., VV_VH, VV_VH_SPK), and in the MT (i.e., MT_3, MT_5) classification scenario, respectively. Therefore, McNemar's χ 2 test was statistically used to compare the classification results achieved by SVM and MLP against other classifiers for each study area (Table  5). SVM is less often wrong than any other classifier in the single-date image analysis. However, it Figure 6. Example subset of each study area shown as Sentinel-2 "true-color" composite (left); classification map using single-date S1 imagery and support vector machine (SVM) classifier (middle); classification map using multitemporal imagery (five scenes) and multi-layer perceptron (MLP) classifier (right).

Discussion
The current research evaluated the possibility of urban vegetation mapping using multitemporal (MT) C-band SAR imagery. Among the ML methods described in the literature [89], new machine learning methods (e.g., XGB, ELM) were tested in this research for classification tasks. Although many studies are based on the classification and interpretation of multispectral satellite imagery than those on SAR imagery, certain studies reported an increased overall classification accuracy using MT SAR imagery [85,[90][91][92]. The results obtained by the tested machine learning methods confirmed that dense time-series of C-band SAR imagery allow discrimination of green and forest areas in urban systems. In this research, UA and K were used in the assessment of classification performance calculated over the entire study area (Table 1, Figure 3). Single-date image classification (i.e., VV_VH, VV_VH_SPK) was also made so that classification performance using the MT imagery could be compared ( Figure 5). Using single-date data, the overall accuracy significantly increased with speckle filtering, which effectively preserved the edges and features. Similar results for LC mapping were also obtained in research by Idol et al. [80] and Lavreniuk et al. [93]. In the MT part of the research, the OA of a classification based on three (MT_3) and five (MT_5) S1 imagery was increased by 8.59% and 13.66% ( Figure 5), as compared to VV_VH_SPK, respectively. By increasing the number of S1 imagery to five (MT_5), the classification accuracy further increased, and according to [85], using more than five dates for LC mapping produces negligible changes in classification accuracy. Additionally, for the MT S1 classification, a single speckle filtering was conducted rather than MT speckle filter [51], since spatial speckle filters yield a higher overall performance, as reported in [3]. Mapping of vegetation in built-up areas (i.e., forest, low vegetation) showed a better classification accuracy based on MT imagery (Table 3 and Figure 4). We used F1 and FoM accuracy metrics as macro-averaged measures that were suitable for evaluating the accuracy of various land-cover classes [69,75,94]. Table 3 shows an improvement in different classification scenarios for discriminating various land-cover classes, especially forest and low vegetation (i.e., grassland, shrubs). As suggested in [70], if omnibus measures (i.e., F1, FoM) are reported, class-specific measures should also be included to characterize the accuracy of a given class. Therefore, the UA and PA values are presented in Table 4. Large omission and commission errors occur in the VV_VH classification scenario, due to the speckle noise [80]. The errors are partly reduced with speckle filtering, but it is found that the C-band of S1 imagery is less suitable to classify vegetation classes in urban areas than, e.g., L-band [95,96]. As shown in Table 4, within sub-optimal temporal windows (i.e., classification using MT imagery), the UA and PA values increased for the individual LC classes. Similar to the previous studies [97,98], our results indicated that MT S1 imagery improved the accuracy of the vegetation mapping.
Zhu et al. [99] used Landsat and SAR data for LC classification of urban areas. For urban and forest categories, the authors recommend the usage of SAR texture measures known as GLCM (gray-level co-occurrence matrix), explained by Haralick et al. [100]. Therefore, to improve classification of the urban vegetation and green areas, the inputs to the classifiers have a more important role [29,[101][102][103], than tuning the machine learning models. Haas and Ban [27] combined S1 and Sentinel-2 imagery for urban ecosystem mapping. Using an SVM classifier, 19 LC categories were mapped in complex urban areas. With the fused approach, some familiar misclassifications for SAR (classes with similar surface backscatter patterns, i.e., roads, runways, still water or lawns) and optical (classes with similar spectral reflectance) data could be reduced. Some classes are difficult to detect using a spectral response from optical data or backscatter from the SAR instrument, but this might be easily distinguished by their combined use [26,104,105]. Although F1 and FoM metrics are more robust than UA and PA [75,106], UA values, as a measure of the reliability of the map, were visualized ( Figure 4) for each study area. Irrespective of the accuracy metrics used in this research, the MLP method classified the forest and low vegetation class over 90%, in the MT_5 scenario (i.e., F1, and UA).
For urban vegetation mapping, the most used machine learning methods for the classification tasks were evaluated. Urban systems are comprised of built-up areas, vegetation, and water surfaces (e.g., lakes, rivers). The example subset of Prague ( Figure 6) emphasize the underestimated water extent location, due to the mixed pixels that have subpixel land presence [88]. In urban areas, these misclassifications pose a great challenge, which can be reduced by using MT imagery, or in combination with VHRSI [107,108]. Camargo et al. [46] used various machine learning methods for classifying several LC categories on ALOS-2/PALSAR-2 imagery. For nine LC classes and 200 training samples, the SVM classifier achieved the highest classification results with an OA and K of 74.18% and 0.68%, respectively. In our research, SVM also produced the best classification results in a single-date classification scenario (Figure 6), i.e., VV_VH and VV_VH_SPK, and the mean OA was 61.63% and 80.24%, respectively. The ability to apply an SVM classifier using a single SAR imagery has already been proven for LC classifications [109]. Zhong et al. [110] developed deep-learning-based LC classification for MT imagery. Similar to our research, MLP with two hidden layers and 512 neurons outperformed every non-deep learning model (i.e., XGB, RF, and SVM). Deeper MLP models did not improve the classification accuracy. In the aforementioned research, a one-dimensional convolutional neural network (CNN) achieved the highest classification results. CNNs should be further investigated for LC classification of the MT SAR imagery [111][112][113].
In this study, using MT S1 imagery for LC classification (i.e., MT_3 and MT_5), the MLP classifier achieved the highest classification results and the ability for vegetation mapping in built-up areas ( Figure 6). On the contrary, ELM produced the lowest results in every classification scenario. Kernel extreme learning machine (KELM) needs to be implemented for LC classification tasks on radar and optical imagery [60,114]. The aforementioned combined use of SAR and optical imagery in MT classification tasks yields many input features (e.g., texture measures, radiometric indices), which requires a high computational capacity. Feature selection techniques should be deeply investigated in order to reduce computational cost [29,96,104,115]. We used the McNemar's test in order to evaluate the significance of the differences between pair-wise classifications in each study area (Table 5).

Conclusions
In this research, we presented a comparative assessment of six machine learning methods using multitemporal (MT) SAR imagery for urban vegetation mapping. Our primary interest was to investigate the potential of S1 imagery for vegetation mapping in urban areas across Europe, since MT SAR data were not equally investigated, as compared to optical satellite data. The study revealed that discrimination of green and forest areas in urban and peri-urban areas increased with time-series of SAR imagery. Urban vegetation mapping using single-date imagery is often inefficient, and dense time-series of SAR imagery (e.g., S1) allows us to capture the phenological stages and to discriminate various land-cover classes. By using three and five S1 imagery for classification, the F1 measure for forest and low vegetation land-cover class exceeded 90%.
Furthermore, by evaluating various classification performance metrics, we selected the optimal classification method for vegetation mapping in the built-up areas. In the single-date image analysis, SVM produced the highest classification accuracy, whereas MLP yielded the best accuracy in all considered MT classification scenarios. For land-cover classification tasks using a single-date SAR imagery, SVM achieved very accurate results for even a small data set, whereas including more temporal dimensions of input data significantly improved MLP. Furthermore, mean values for all machine learning methods increased the overall classification accuracies, i.e., using three and five S1 imagery, by 49% and 58%, compared to single-date image analysis on the VV and VH bands, respectively.
This research allowed us to confirm the possibility of MT C-band SAR imagery for urban vegetation mapping. However, some deficiencies were present (e.g., mixing built-up areas with bare land or forest classes), so additional texture features or fusion with optical satellite imagery could be used along with C-band imagery. Furthermore, deep-learning classification techniques (e.g., CNN) should be thoroughly investigated for MT SAR imagery, as well as parameter optimization (e.g., k-fold cross-validation), in order to obtain the best classification performance.

Conflicts of Interest:
The authors declare no conflict of interest.