Integration of Sentinel-1 and Sentinel-2 for Classification and LULC Mapping in the Urban Area of Belém, Eastern Brazilian Amazon

In tropical regions, such as in the Amazon, the use of optical sensors is limited by high cloud coverage throughout the year. As an alternative, Synthetic Aperture Radar (SAR) products could be used, alone or in combination with optical images, to monitor tropical areas. In this sense, we aimed to select the best Land Use and Land Cover (LULC) classification approach for tropical regions using Sentinel family products. We choose the city of Belém, Brazil, as the study area. Images of close dates from Sentinel-1 (S-1) and Sentinel-2 (S-2) were selected, preprocessed, segmented, and integrated to develop a machine learning LULC classification through a Random Forest (RF) classifier. We also combined textural image analysis (S-1) and vegetation indexes (S-2). A total of six LULC classifications were made. Results showed that the best overall accuracy (OA) was found for the integration of S-1 and S-2 (91.07%) data, followed by S-2 only (89.53%), and S-2 with radiometric indexes (89.45%). The worse result was for S-1 data only (56.01). For our analysis the integration of optical products in the stacking increased de OA in all classifications. However, we suggest the development of more investigations with S-1 products due to its importance for tropical regions.


Introduction
Land Use and Land Cover (LULC) data are important inputs for countries to monitor how their soil and land use are being modified over time [1,2]. It is also possible to identify the impacts of increasing urban environments in different ecosystems [3][4][5][6], monitoring protected areas, and the expansion of deforested areas in tropical forests [1,[7][8][9].
Remote sensing data and techniques are used as tools for monitoring changes in environmental protection projects reducing in most cases the prices of surveillance. An example is the LULC approach for monitoring Reduced Emissions from Deforestation and Forest Degradation (REDD+) [10,11] and for ecosystem services (ES) modeling and valuation [12][13][14]. For the latter purpose, the LULC mapping has been used to enhance the results found for Costanza et al. [15] that provided global ES values. In that research, the values have been rectified since its first publication [16,17]; the LULC approach provides land classes which allow to estimate ES by unit area, making it possible to extrapolate ES estimates and values for greater areas and biomes around the world by using the benefit transfer

Data Source and Collection
The products acquisition of both S-1 and S-2 were performed in the Copernicus open access hub platform considering the cloudy coverage of less than 5% for the S-2 product and the date proximity of the S-1 product in relation to the S-2 (one day of difference). The Planet Labs scenes were acquired through a contract of the Environment and Sustainability Secretariat of the state of Pará, Brazil.

Data Source and Collection
The products acquisition of both S-1 and S-2 were performed in the Copernicus open access hub platform considering the cloudy coverage of less than 5% for the S-2 product and the date proximity of the S-1 product in relation to the S-2 (one day of difference). The Planet Labs scenes were acquired through a contract of the Environment and Sustainability Secretariat of the state of Pará, Brazil. In order to cover the whole study area, two S-1 images were collected with an S-1 C-band SAR Interferometric Wide Swath (IW) in dual polarization mode (VV + VH) from 21 July 2017. Data characteristics and the main characteristics of S-1 are described in Table 1. One scene of S-2A Level-1C (hereafter, L1C), with radiometric and geometric corrections, was acquired for this study. The S-2A L1C provides the top of atmosphere (TOP) reflectance. The S-2A L1C has a radiometric resolution of 12 bits, a swath width of 290 km, and the wavelength of its bands range from 443 nm to 2190 nm. The spatial resolution of the bands is distributed as (i) four of 10 m, (ii) six of 20 m, and (iii) three of 60 m. The image selected has 0% of cloud cover and is from 20 July 2017. Disregarding the SWIR/Cirrus band, which was used for the atmospheric correction, all bands were used for the classification step.

Planet Imagery
Seventeen high-resolution Planet scenes acquired in 28 July 2017 were used to validate the RF classification. Since early 2017, the sun-synchronous orbit of this satellite has the temporal resolution of one day, making it an excellent instrument for monitoring and data validation [60,61]. The specifications of the Planet mission for the images acquired are described in Table 2.

Data Analysis
The data processing is presented in the flowchart illustrated in Figure 2 and involves (i) preprocessing and data integration, (ii) product segmentation and RF classification, and (iii) accuracy assessment and validation.

Preprocessing Data
The preprocessing of the S-2 consisted in the atmospheric correction of the data, made by Sen2Cor algorithm [36,62,63] to obtain surface reflectance. All S-2 spectral bands were resampled to 10-m spectral resolution using the bilinear upsampling method and a mean downsampling method.
As already indicated, two S-1 images were also used, and a slice assemble technique was required to join them. A split of the subswots IW1 and IW2 was applied to reduce the scene size, hence improving the processing time. The application of the orbit file, radiometric correction, thermal

Preprocessing Data
The preprocessing of the S-2 consisted in the atmospheric correction of the data, made by Sen2Cor algorithm [36,62,63] to obtain surface reflectance. All S-2 spectral bands were resampled to 10-m spectral resolution using the bilinear upsampling method and a mean downsampling method.
As already indicated, two S-1 images were also used, and a slice assemble technique was required to join them. A split of the subswots IW1 and IW2 was applied to reduce the scene size, hence improving the processing time. The application of the orbit file, radiometric correction, thermal noise removal, and deburst was applied as it is a well-consolidated methodology. We opted not to apply a Speckle filter, using Multilooking with a single look (5 m Range looks and 20 m Azimuth looks). Finally, a range-Doppler terrain correction was applied, using the UTM WGS84 projection and the 30 m SRTM, where 10-m resampling was made to fit the integration requirements [27,36,37,64].

Radar Textures and Multispectral Indexes
All of the derived information, for both S-1 and S-2 data, was calculated in the SNAP 6.0 software. For the S-1 product, we derived three Grey-Level Co-occurrence Matrix (GLCM), with a 5 × 5 mobile window size, in all directions based on the variogram method [65]. In general, GLCM estimates the probability of pixel values (within moving windows) co-occurring in a given direction and a certain distance in the image [66]. We computed the mean, variance, and correlation (Table 3). These GLCM statistics were applied for both VV and VH, generating a total of six products to be added in the RF classification [40,46,67]. Moreover, for the S-2 product, we estimated three different normalized radiometric indexes (NDVI-Normalized Difference Vegetation Index, NDWI-Normalized Difference Water Index, and SAVI-Soil-Adjusted Vegetation Index) ( Table 3).

S-1 GLCM Textural Measures
Where NIR is the near infrared band, 842 nm for S-2, for NDVI, NDWI, and SAVI; Red is 665 nm for S-2 for NDVI and SAVI; MIR (Medium Infrared) is 2190 nm for S-2, for NDWI; P(i,j) is a normalized gray-tone spatial dependence matrix such that SUM(i,j = 0, N − 1) (P(i,j)) = 1; i and j represent the rows and columns, respectively, for the measures of Mean, Variance and Correlation; µ is the mean, for the Variance textural measure; and N is the number of distinct grey levels in the quantized image; µ x . µ y , σ x , and σ y are the means and standard deviations of p x and p y , respectively, for the correlation textural measure.

Image Stacking and Image Segmentation
For the image stacking of the S-1 and S-2 products it was used the nearest neighbor resampling method. In the tool selected, two products were used, where the pixel values of one product (the slave) were resampled into the geographical raster of the other (the master) [71]. We used the S-1 product as the master and S-2 data as the slave [27,32,37,63,72]. Subsequently, and similarly, the integration was made with S-2 and vegetation and water indexes, S-1 and GLCM textural measures, and the combination of all products generated [46].
To better aggregate pixels with similar values, a segmentation procedure was performed [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering Sensors 2019, 19, 1140 7 of 20 the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions. Agriculture C1 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class

Built-up C5
Airport C2 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 [32,73,74]. This procedure was made using only the collocation of S-1 and S-2 products. For this, a local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Bare Soil C3 Beach C4

Built-up C5
Bare Soil C3 local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. Table 4. Keys of interpretation to recognize the different LULC with S-1 and S-2 colored compositions.

Classes
Class Built-up C5 local mutual best fitting region merging criteria was performed and a Baatz & Schape merging cost criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2. criteria selected [75,76]. A total of 82,246 segments were produced.

Random Points' Classification and RF Image Classification
After the segmentation process, all the other steps were developed in GIS software. Basically, 1600 random points were defined to be overlapped by the segmentation polygons, and these were visually interpreted as one of the twelve selected classes. The twelve classes were defined considering the potential attributed by the literature to RF classification algorithm and the S-1 and S-2 synergy, and the particularities of the study area [27,37]. In Table 4 it is possible to identify how the classes were interpreted with colored compositions for S-1 and S-2.  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS 10.4 software. The RF Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS 10.4 software. The RF Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS  Water without Sediments C12 * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS 10.4 software. The RF * The drawn polygons were the ones produced in the segmentation and classified for each class to perform the LULC classification.
The RF is described in the literature as an ML classification algorithm where the users must choose a minimum of two parameters: (i) the number of trees to grow in the forest (N) and (ii) the depth of those trees (n) [50,52]. We implement this procedure in ArcGIS 10.4 software. The RF classifier in the ArcGIS software is named "Train Random Trees Classifier". In this classifier, the user must input the following parameters; (i) the satellite image to be classified, (ii) the train sample file (in shapefile format), (iii) the max number of trees (N), (iv) the max tree depth (n), and (v) the max number of samples per class (parameter in which we used the default value of 1000). We also added the segment attributes of Color, Mean, Std, Count, Compactness, and Rectangularity, as they were options in the ArcGIS 10.4 RF algorithm.
To select the number of variables that provide sufficiently low correlation with adequate predictive power, tests were conducted aimed at assessing the best accuracy. All experiments were chosen proportionally with the default values proposed by Forkuor et al. [22]. The best possible scenario, in which the classification was able to run in the ArcGIS 10.4, was with 700 as maximum number of trees (N), 420 as the depth of each tree can grow (n), and 1000 as max number of samples per class; values were applied for all the six classifications made. These numbers were the most significant possible, once the literature suggests that there is no standard value for the number of trees (N) and the number of variables randomly sampled as candidates at each split (n) [50,52].

Accuracy Assessment
As for validation assessments, we carried out a few tests to fully comprehend how the classification was defined and how good was the results found. Firstly, we investigated the mean and standard deviation of the spectral signatures for both S-1 polarizations and for all S-2 bands used for RF classification. With this assessment, we could identify which were the bands that had a greater distance (separation), in the analyzed samples. To evaluate the performance of the attributes and accuracy of the maps produced, we performed statistical approaches based on the separability means of Jeffries-Matusita (JM) and Transformed Divergence (TD), for the two polarizations of S-1 and all bands of the S-2. These coefficients are based on the Bhattacharyya distance statistics and range from 0 to 2, where values greater than 1.8 are quite distinct and values below 1.0 should be disregarded or grouped into a single class [77]. After understanding how the bands statistically separated and how different were their spectral responses, it is also possible to know and how much (algorithm decision making) they contributed to the classification. In this sense, we investigated the bands' contribution for each LULC classification produced.
As a product of the random trees' classifier, in ArcGIS 10.4, we analyzed the Producer's and User's accuracy (PA and UA, respectively). The PA and UA are prior accuracies (made by 60% of the training samples), which are produced by the trees selected in the model. We analyzed them in the six LULC classification performed.
Finally, by cross-validation performed through the collection and visual interpretation of classes (in Planet's high spatial resolution images) of 1232 random points, we computed the overall accuracy (OA) and the Kappa coefficient. These are statistics that allow us to analyze and affirm the validity and accuracy of the results. We also ranked these OA and Kappa coefficient values from highest to lowest.

Results
The analysis of potentially separable classes through the VV and VH polarizations are presented in Figure 3. The separability found with low values occurs due of a superficial backscatter, highlighting in this case separation in the order of 7 dB, 6 dB, and 9 dB, for airport, beaches, and water with sediment, respectively. On the other hand, we could also identify the occurrence of backscatter double bouncing (built-up areas) and volumetric (primary and secondary vegetation). Because of the identified backscattering, the possibility of class identification through RF increases. However, for the other classes, in which the box plots significantly overlap, the likelihood of variable distinction decreases. For the bands explored in the analysis of separation by the spectral response (surface reflectance) of each class (Figure 4), we could see that some classes tend to get overlaid, once their spectral responses are quite similar in different wavelengths. This was the case for agriculture, grassland, primary vegetation, and urban vegetation, in which the lines generated in the dispersion have close values at different wavelengths. A high separability of classes was noted in the following classes: mining, airport, water with sediments and water without sediments. The classes that present different spectral behaviors from the others tend to perform better classification results since when applying the RF algorithm, it can produce trees and choose the variables for classification with higher precision. The analysis of these backscattering and spectral patterns allows us to provide information for further monitoring of these classes in the study area. For the bands explored in the analysis of separation by the spectral response (surface reflectance) of each class (Figure 4), we could see that some classes tend to get overlaid, once their spectral responses are quite similar in different wavelengths. This was the case for agriculture, grassland, primary vegetation, and urban vegetation, in which the lines generated in the dispersion have close values at different wavelengths. A high separability of classes was noted in the following classes: mining, airport, water with sediments and water without sediments. The classes that present different spectral behaviors from the others tend to perform better classification results since when applying the RF algorithm, it can produce trees and choose the variables for classification with higher precision. The analysis of these backscattering and spectral patterns allows us to provide information for further monitoring of these classes in the study area.  The JM and TD variability results are illustrated in matrix presented in Table 5. For the separability of classes in S-1, we were able to identify only good values (above 1.8) for some airport separability with other classes, for primary vegetation separability with water with sediments and water with sediments with water without sediments. On the other hand, the class separability in for the S-2 bands was significant for different classes, reaching the maximum value for both water categories and primary vegetation. These separability results reassure us that the potential of S-2 to identify a more significant number of classes is significantly better than for the S-1 to do so.  In Figure 5 it is illustrated the six LULC classification produced. The JM and TD variability results are illustrated in matrix presented in Table 5. For the separability of classes in S-1, we were able to identify only good values (above 1.8) for some airport separability with other classes, for primary vegetation separability with water with sediments and water with sediments with water without sediments. On the other hand, the class separability in for the S-2 bands was significant for different classes, reaching the maximum value for both water categories and primary vegetation. These separability results reassure us that the potential of S-2 to identify a more significant number of classes is significantly better than for the S-1 to do so.  The contribution of each band for the six RF classifications produced is described in Figure 6A-F. Band 12 of the S-2 product (SWIR band) was the only band that repeats, as the most significant contributor for the RF classification made (see Figure 6B,E), this band is the main contributor for the classifications of S-2 only and the integration of S-1 with S-2. Thus, in all the classifications that contain the S-2 band, some of its bands were identified as one of the most significant contributors to the classification. Therefore, for the classification with all products, the red band of S-2 had the highest contribution (0.0641), for S-1, together with S-2, and for the S-2 only, it was SWIR band 12 of S-2 (0.1 and 0.1067, respectively), and for S-2 and indexes it was the SWIR band 11 (0.084). On the other hand, for the classifications that only consider the SAR products, it was shown that the largest contribution was from S-1 VV (0.6307), and for S-1 and its textures, it was the S-1 VH GLCM Mean (0.1434). The contribution of each band for the six RF classifications produced is described in Figure 6A-F. Band 12 of the S-2 product (SWIR band) was the only band that repeats, as the most significant contributor for the RF classification made (see Figure 6B,E), this band is the main contributor for the classifications of S-2 only and the integration of S-1 with S-2. Thus, in all the classifications that contain the S-2 band, some of its bands were identified as one of the most significant contributors to the classification. Therefore, for the classification with all products, the red band of S-2 had the highest contribution (0.0641), for S-1, together with S-2, and for the S-2 only, it was SWIR band 12 of S-2 (0.1 and 0.1067, respectively), and for S-2 and indexes it was the SWIR band 11 (0.084). On the other hand, for the classifications that only consider the SAR products, it was shown that the largest contribution was from S-1 VV (0.6307), and for S-1 and its textures, it was the S-1 VH GLCM Mean (0.1434). Sensors 2019, 19, x 13 of 21 The PA and UA results are presented in Table 6. The optical integration with radar and the optical only classifiers stands out with generally better results of PA. The worst results were found for classifiers without the S-2, being S-1 only and S-1 with GLCM secondary products. The UA followed a similar trend. The worst result by class was found for the agriculture and mining classes in the classifier that used radar and its textures, where the results for both PA and UA were equal to 0%. However, the mining class achieved 100% in PA for all classifiers that had S-2 bands. S-2 only, S-2 with its indexes, and the integration of S-1 with S-2 had more than one PA equal to 100%; Agriculture (C1) and mining (C8) were repeated in all these classifications, for the integration of S-1 and S-2, the Airport (C2) class also had PA equals to 100%. The only UA with 100% was for the identification of beaches in the classification S-2 with indexes. Table 6. PA and UA for each class in the different types of RF classifications produced *.  The PA and UA results are presented in Table 6. The optical integration with radar and the optical only classifiers stands out with generally better results of PA. The worst results were found for classifiers without the S-2, being S-1 only and S-1 with GLCM secondary products. The UA followed a similar trend. The worst result by class was found for the agriculture and mining classes in the classifier that used radar and its textures, where the results for both PA and UA were equal to 0%. However, the mining class achieved 100% in PA for all classifiers that had S-2 bands. S-2 only, S-2 with its indexes, and the integration of S-1 with S-2 had more than one PA equal to 100%; Agriculture (C1) and mining (C8) were repeated in all these classifications, for the integration of S-1 and S-2, the Airport (C2) class also had PA equals to 100%. The only UA with 100% was for the identification of beaches in the classification S-2 with indexes. Table 6. PA and UA for each class in the different types of RF classifications produced *.   Table 7 illustrates the OA and the Kappa coefficients found in this research. It is possible to understand that the integration of the S-1 and S-2 products resulted in a more precise product (91.07% of OA and 0.8709 of Kappa) and, as expected, the S-1 product alone had the worst result of all the analyses (56.01% of OA and 0.4194 of Kappa). The inclusion of textures in the S-1 products increased the results of the RF classification (61.61% OA and 0.4870 of Kappa), on the other hand, the inclusion of vegetation and water indexes in the S-2 product reduced its OA (89.45%) and Kappa coefficient (0.8476) when compared with the S-2 alone (89.53% of OA and 0.8487 for the Kappa coefficient). The integration of all the products analyzed produced the worst result among all the data combination that have S-2 products involved (87.09% OA and 0.8132 of Kappa coefficient). However, the results of OA and Kappa coefficient for the four best classifications were similar.

Discussion
Among the ML methods described in the literature [50], SVM and RF stand out for their good classification accuracy. These methods usually have good results when compared with similar methods such as k-NN, or more sophisticated ones, such as ANN and Object Based Image Analysis (OBIA) methods [27,36,49]. Since these two types of ML algorithms are the most outstanding, we analyzed the discussion considering some papers that used both algorithms.
We verified that the application of optical radiometric indexes and radar textures is widely accepted as mean to improve ML classification [11,27,36,46,63,78]. However, our OA result, when considering all bands in the classification, was lower than for S-1 and S-2, S-2 only, and S-2 with indexes. These results show that the insertion of the data derived from the optical image did not have a significant impact on the final classification, whereas the SAR product data, which improved the classification when considering S-1 only, contributed to the classification, but not enough to raise OA when all bands were considered. Some authors argue that major classification enhancements occur only when we insert primary data into the dataset, such as SRTM [63,64,79].
The lowest OA values found were for S-1 products; this is in agreement with what is found in the literature [27,36,40,46,80]. SAR products, while having the advantage of penetrating the clouds and always giving views consistent with the Earth's surface, fail when forced to distinct a vast amount of features (classes) [80,81]. In our study, this is noticeable in Table 5, in which JM and TD values were lower than 1.8 in almost all categories, and this can be seen in local studies on the Brazilian Amazon coast, even using radar images with better quality than those of S-1 [82,83]. Discrimination of a large number of classes on the radar is possible only through the application of advanced techniques, such as SAR polarimetry [84].
Maschler et al. [85], in an application of the RF classifier with high spatial resolution data of 0.4 m (Hyspex VNIR 1600 airborne data), obtained excellent separability of the classes in the electromagnetic spectrum. From this good separability, they were also able to produce an excellent OA of 91.7%. This separability and these good results can be identified in our study too. The possibility of separating the classes from the insertion of the training and validation samples is fundamental for the satisfactory production of the results for LULC classification.
Among the authors who applied several ML methods (RF, SVM, and k-NN), Clerici et al. [36] consider the data fusion of S-1 and S-2 products. They also applied radiometric vegetation indexes (NDVI, Sentinel-2 Red-Edge Position index (S2REP), Green Normalized Difference Vegetation Index (GNDVI), and Modified SAVI) and textural analysis of the S-1, in order to interpret their contributions to the supervised classification accuracy. Six classes were tested after segmentation to have similar pixel values. Their results were considerably worse than ours. In their image stacking, they found an OA of 55.50% and a Kappa coefficient of 0.49. The isolated results of S-1 and S-2 were also worse than our integration. They suggest SVM for their study area because it presents better accuracy results.
Whyte et al. [27] applied RF and SVM algorithms for LULC classification in the synergy of S-1 and S-2. These authors used the eCognition image processing software to produce segments and the ArcGIS 10.3 for RF and SVM classifications. They applied derivatives from both S-1 and S-2 to test the data combination for LULC. They selected 15 classes and found better results when using all products (S-1, S-2, and their derivatives) using the RF algorithm. The OA was 83.3% and the Kappa coefficient was 0.72. However, all the scenarios of synergy appeared to have higher results than using optical data only. This study contrast to what we found, once our results with derivatives were worse in almost all circumstances, except in S-1 and GLCM stacking.
Zhang & Xu [49] also applied the fusion test of optical images and radar for multiple classifiers (RF, SVM, ANN, and Maximum Likelihood). Optical images of Landsat TM and SPOT 5 were used, while SAR images were of ENVISAT ASAR/TSX. The authors could interpret that the best values were found for the RF and SVM classifications, while the fusion of optical and SAR data contributed to the improvement of the classification, increasing the accuracy by 10%.
Deus [46] used the synergy of ALOS/PALSAR L band and Landsat 5 TM and applied several vegetation indexes and SAR textural analysis. The author applied the SVM algorithm in order to obtain five LULC classes. Their highest OA was 95% when only the features with the best performance in the classification were combined, including both PALSAR and TM bands and their derivatives.
Jhonnerie et al. [86] and Pavanelli et al. [40] also used ALOS/PALSAR and the Landsat family. They considered LULC classes 8 and 17, respectively. Both studies applied the RF algorithm. While the first authors used the ERDAS image application for RF classification, the second authors used the R software for their image classification. They applied vegetation indexes and GLCM textural analysis. For the RF classification, both authors found their highest OA and Kappa coefficient results for the hybrid model, 81.1% and 0.760 [86] and 82.96% and 0.81 [40].
Erinjery et al. [11] also used the synergy between S-1 and S-2, and their derivatives to compare results from two ML, including Maximum Likelihood and RF. The total number of classes was seven.
For the RF classification, an OA of 83.5% and a Kappa coefficient of 0.79 were found. They state the inclusion of SAR data and textural features for RF classification as a tool to improve the classification accuracy. Similarly to our study, they found an OA lower than 50% when using only the S-1 product.
Shao et al. [78] integrated the S-1 SAR imagery with the GaoFan optical data to apply a RF classification algorithm in six different LULC classes. They also produce SAR GLCM textures and vegetation indexes. Their best result was obtained considering all the features stacked. An OA of 95.33% and a Kappa coefficient of 0.91 was obtained. The use of S-1 only had the worst result, with an OA of 68.80% and a Kappa coefficient of 0.35.
Haas & Ban [37], in a data fusion analysis of S-1 and S-2, applied the SVM classification method. Considering 14 classes, they found an OA of 79.81% and a Kappa coefficient of 0.78. The authors suggested the postclassification analysis to improve its accuracy once their final objective was to have area values to apply the benefits transfer method of Burkhard et al. [17] to environmentally evaluate the urban ecosystem services of an area. The number of classes was similar to our study, and it was possible to check that the accuracy was superior.
In regional studies [2], and for global applications [87,88] of RF classification, the results found of OA were below those of ours. Their OA results were 75.17% [88], 63% for South America using RF [87], and 76.64% for Amazon LULC classification [2]. All these studies used satellite data with a temporal resolution of 16 days and a spatial resolution of 30 m. In contrast, we found results with a revisit time of five days for optical and six days for SAR, and with a greater spatial resolution (10 m). Also, we produced a dataset with more classes (12) against six [87] and seven [88] classes found for global studies.

Conclusions
In this work, the best result found was in the integration of S-1 and S-2 products. In general, integrating the vegetation and water indexes and SAR textural features made the OA and kappa coefficient decrease. The worse result was found for the S-1 only classification. The results encountered agreed in its most part with the literature. For our better classifications, the OA results were significantly greater than what is found in the literature for global and Amazon applications of RF classification.
In both the PA and UA scenarios and the Kappa statistic scenario it can be stated that the integration of S-1 and S-2 presented better results in the implemented ML technique. However, if on the one hand the GLCM increased the SAR product accuracy, on the other hand, the inclusion of vegetation and water indexes decreased the optical accuracy, when compared with the single use of S-1 and S-2, respectively. Lastly, the results found for the integration of all products were worse than the ones observed for the combination of S-1 and S-2 only.
Depending on the final aim of the LULC classification, it could be relevant to make a postclassification analysis because many spectral responses resemble each other and can confuse the ML process. However, in the best scenario produced, the accuracy found was satisfactory for several types of analysis. Furthermore, it is possible to use the data integration of S-1 and S-2 to LULC cover classification in tropical regions. It is noteworthy that few studies with similar methodology were found in the literature for the southern hemisphere.
In this sense, the research findings can contribute to current knowledge on urban land classification since methodology applied produces more accurate local data. Furthermore, the present results were obtained by using a smaller revisit period and a higher number of land classes than previous global studies which considered our study area. Future work must be done with S-1 and its variables; once in tropical regions it is difficult to have long terms of optical data available. Finally, we encourage the synergetic use of S-1 and S-2 for LULC classification, considering the availability on near date.