A Comparative Assessment of Machine-Learning Techniques for Land Use and Land Cover Classification of the Brazilian Tropical Savanna Using ALOS-2/PALSAR-2 Polarimetric Images

This study proposes a workflow for land use and land cover (LULC) classification of Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) images of the Brazilian tropical savanna (Cerrado) biome. The following LULC classes were considered: forestlands; shrublands; grasslands; reforestations; croplands; pasturelands; bare soils/straws; urban areas; and water reservoirs. The proposed approach combines polarimetric attributes, image segmentation, and machine-learning procedures. A set of 125 attributes was generated using polarimetric ALOS-2/PALSAR-2 images, including the van Zyl, Freeman– Durden, Yamaguchi, and Cloude–Pottier target decomposition components, incoherent polarimetric parameters (biomass indices and polarization ratios), and HH-, HV-, VH-, and VV-polarized amplitude images. These attributes were classified using the Naive Bayes (NB), DT J48 (DT = decision tree), Random Forest (RF), Multilayer Perceptron (MLP), and Support Vector Machine (SVM) algorithms. The RF, MLP, and SVM classifiers presented the most accurate performances. NB and DT J48 classifiers showed a lower performance in relation to the RF, MLP, and SVM. The DT J48 classifier was the most suitable algorithm for discriminating urban areas and natural vegetation cover. The proposed workflow can be replicated for other SAR images with different acquisition modes or for other types of vegetation domains.


Introduction
Land use and land cover (LULC) data are essential in several activities, including urban and regional planning [1,2] natural resources inventories [3,4], global environmental modeling processes [5], and monitoring of greenhouse gas emissions related to deforestation and forest degradation [6,7]. Although most of the LULC mappings in Brazil have been produced using optical remote sensing data [8,9], they present limitations in tropical regions because of these regions' persistent cloud coverage. In this study, we considered the use of synthetic aperture radar (SAR) data that are obtained by active systems operating in the microwave range of the electromagnetic spectrum. SAR images present high sensitivity to soil moisture content [10], surface roughness [11,12], and vegetation structure [13]; therefore, they are highly complementary in relation to the optical images that are strongly dependent manifold target decomposition approaches, has been proposed to discriminate the major LULC classes from the Cerrado biome at a more refined legend level as compared to previous studies.

Materials and Methods
This section presents the methodological approach proposed in this study. Firstly, we present the description of the study site (location and major characteristics), and then the satellite data used in this paper. Next, the steps of preprocessing, image segmentation, attribute extraction, classification, and validation are detailed.

Study Area
The study area (3660 km² in surface area; 15°22´ south latitude; 47°32´ west longitude) is located in the eastern portion of the Goiás State and in the northeastern sector of the Federal District, Brazil ( Figure 1). The area corresponds to the boundaries of an ALOS-2/PALSAR-2 scene acquired in the StripMap (SM2) image acquisition mode. This area was selected because it presents representative LULC classes found in the Cerrado biome. Native vegetation, croplands, and pasturelands occur predominantly in the central part of the scene, while urban areas are located in middle-southern part of the image [29,30]. Croplands are mostly represented by soybeans and corn, although some vegetables are produced under the center-pivot irrigation system [31]. The forestlands are composed of gallery forests, dry forests, and Cerradão [32]. The shrublands (shrub Cerrado, Cerrado shrubland, and dense Cerrado) correspond to a mosaic of different proportions of shrubs and trees that occur over a grass-dominated layer, while grasslands are composed of native grass species. The topography is mostly flat at the central part of the study area; The forestlands are composed of gallery forests, dry forests, and Cerradão [32]. The shrublands (shrub Cerrado, Cerrado shrubland, and dense Cerrado) correspond to a mosaic of different proportions of shrubs and trees that occur over a grass-dominated layer, while grasslands are composed of native grass species. The topography is mostly flat at the central part of the study area; along the north-south direction, there is a relatively rough terrain (Serra Geral do Paranã) with dominant folded and faulted metasediments [33].

Materials
This study was based on the polarimetric ALOS-2/PALSAR-2 L-band images obtained on 14 May 2016 in the StripMap (SM2), polarimetric, and High Sensitive mode (quad-pol, pixel size of 6 m, ascending orbit, incidence angle of 27.8 • , and Single Look Complex (SLC), 1.1 processing level). We used SNAP 6.0 software for the SLC data preprocessing, eCognition Developer 8.7 software for image segmentation and attribute extraction, and WEKA 3.8 software for image classification.
Multispectral-and panchromatic-pansharpened LANDSAT-8/Operational Land Imager (OLI) satellite images [34] obtained on 2 May 2016 and 3 June 2016, as well as the higher spatial resolution images available in the Google Earth and Bing platforms, were accessed by the QuickMapServices plugin available in QGIS 3.0 software and were then used for validation purposes. Other datasets utilized were: the annual and municipality-based agricultural production reports from 2015 (PAM-Produção Agrícola Municipal) [31]; the vector-based, LULC data produced by the MapBiomas [30] and TerraClass [29] projects; and the daily based rainfall data from the National Institute of Meteorology [35]. Figure 2 shows the main steps of this study, which involved ALOS-2/PALSAR-2 image preprocessing, a legend definition of the LULC map, image segmentation and classification, and validation.

Approach
This study was based on the polarimetric ALOS-2/PALSAR-2 L-band images obtained on 14 May 2016 in the StripMap (SM2), polarimetric, and High Sensitive mode (quad-pol, pixel size of 6 m, ascending orbit, incidence angle of 27.8, and Single Look Complex (SLC), 1.1 processing level). We used SNAP 6.0 software for the SLC data preprocessing, eCognition Developer 8.7 software for image segmentation and attribute extraction, and WEKA 3.8 software for image classification.
Multispectral-and panchromatic-pansharpened LANDSAT-8/Operational Land Imager (OLI) satellite images [34] obtained on 2 May 2016 and 3 June 2016, as well as the higher spatial resolution images available in the Google Earth and Bing platforms, were accessed by the QuickMapServices plugin available in QGIS 3.0 software and were then used for validation purposes. Other datasets utilized were: the annual and municipality-based agricultural production reports from 2015 (PAM-Produção Agrícola Municipal) [31]; the vector-based, LULC data produced by the MapBiomas [30] and TerraClass [29] projects; and the daily based rainfall data from the National Institute of Meteorology [35]. Figure 2 shows the main steps of this study, which involved ALOS-2/PALSAR-2 image preprocessing, a legend definition of the LULC map, image segmentation and classification, and validation.

Preprocessing
The ALOS-2/PALSAR-2 SLC data were converted into backscattering coefficients (σ o ) using the following equation (Equation (1)) [36]: where I and Q are the real and imaginary parts of the SLC images. CF corresponds to the radiometric calibration factor (−83 dB), and A is the conversion factor (32 dB) [36]. For the speckle noise attenuation, we employed the Refined Lee polarimetric filter and an adaptive window size of 7 pixels × 7 pixels. This filter preserves the statistics and the linear features of the images [37].

Preprocessing
The ALOS-2/PALSAR-2 SLC data were converted into backscattering coefficients (σ o ) using the following equation (Equation (1)) [36]: where I and Q are the real and imaginary parts of the SLC images. CF corresponds to the radiometric calibration factor (−83 dB), and A is the conversion factor (32 dB) [36]. For the speckle noise attenuation, we employed the Refined Lee polarimetric filter and an adaptive window size of 7 pixels × 7 pixels. This filter preserves the statistics and the linear features of the images [37]. The incoherent polarimetric parameters are derived from the power measurements in σ o [38]. In this research, these parameters were generated to compose the set of attributes used in the machine-learning phase. The following indices were generated: Radar Vegetation Index (RVI) [39]; Radar Forest Degradation Index (RFDI) [40]; Canopy Structure Index (CSI); Volume Scattering Index (VSI); and Biomass Index (BMI) [41]. Parallel (co-pol) and cross-polarization (cros-pol) ratios were also generated [38].
Target decomposition aims to represent scattering processes as a sum of independent elements related to the physical scattering mechanisms [42]. The methods of target decomposition are classified into coherent and incoherent types [42,43]. Coherent decompositions assume the existence of deterministic scatterers and that the backscatter wave is polarized. In general, this type of target decomposition uses the Jones scattering matrix to represent the polarization states of the electromagnetic wave. Incoherent decompositions assume that scattering is not deterministic, so the backscattered wave is partially polarized. In this case, the power reflection matrices (covariance and coherence matrices) are used to characterize the backscattered wave [37,44].
In remote sensing applications, the assumption of the occurrence of pure deterministic targets is invalid [44], so the power reflection matrices are often used. In this study, we used only incoherent methods. The following algorithms were considered: van Zyl (three components) [45]; Freeman-Durden (three components) [46]; Yamaguchi (four components) [47]; and Cloude-Pottier (three components: entropy (H), anisotropy (A), and α angle) [42]. The decompositions were generated directly from the SLC images using the SNAP 6.0 application and the 5 pixels x 5 pixels window. Filters were not applied in the power matrices used in the polarimetric decompositions.

Image Segmentation and Attribute Extraction
The calibrated polarized images in terms of backscattering coefficients, the incoherent polarimetric parameters, and the polarimetric decompositions parameters were orthorectified using the range Doppler model specific for SAR sensors. The 30-m spatial resolution, digital elevation models (DEM) obtained by the Shuttle Radar Topograpy Mission (SRTM) were used in the orthorectification process.
For SAR images segmentation (σ o HH , σ o HV , σ o VH , and σ o VV ), a multiresolution algorithm based on growing region was used [48]; in considering the scaling factor and the homogeneity composition variables, the latter was divided into color and shape. The shape, in turn, is subdivided into compactness and smoothness. The scale defines the size of the segments of an image, and the homogeneity composition tests the equality between segments [49]. Only one level of segmentation was generated, with parameters defined from several empirical tests. The scale parameter of 50 was selected, and weights were assigned to the criteria of homogeneity (shape = 0.10; color = 0.90; smoothness = 0.50; and compactness = 0.50).
After segmentation, the segment attributes were extracted. Among the various existing categories of attribute metrics, we selected the layer values of mean, standard deviation, asymmetry, and the pixel-based, minimum and maximum values. Thus, for each of the 25 images (eight polarimetric parameters, 13 decomposition components, and four polarizations) available, the attributes of the two categories mentioned previously were extracted. Therefore, a set of 125 layers of attributes (five attributes for 25 images) was used in the machine-learning-based classifications.

Classification and Validation
The following machine-learning classification algorithms were analyzed: NB, DT J48, RF, MLP, and SVM. An NB classifier employs the Bayesian theory dealing with conditional probability and predictions of events, with strong (naive) independent assumptions. NB assumes that the presence (or absence) of a given feature of a class is not related to the presence (or absence) of any other feature.
Depending on the precise nature of the probability model, NB classifiers can be trained very efficiently Remote Sens. 2019, 11, 1600 6 of 16 in a supervised learning framework. In many practical applications, parameter estimation for NB models uses the method of maximum likelihood; in other words, one can work with the NB model without assuming Bayesian probability or using any Bayesian methods. Despite their naive setting and apparently over-simplified assumptions, NB classifiers have performed quite satisfactorily in many complex real-world situations. An advantage of the NB classifier is that it only requires a reduced amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix [50][51][52].
The DT classifier (DT J48) consist in a graph that employs the "divide-and-conquer" approach to test attributes and assign classes to independent instances [53]. Basically, DTs are a non-parametric supervised learning method used for classification and regression. DTs learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model. A DT builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets, while at the same time an associated DT is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node has two or more branches. Leaf node represents a classification or decision. The topmost decision node in a tree which corresponds to the best predictor is called the root node. DTs can handle both categorical and numerical data. There are several steps involved in the building of a DT. The first one is the process of partitioning the dataset into subsets, named splitting. Splits are formed on a particular variable. The second one is pruning, which corresponds to the shortening of branches of the tree. Pruning is the process of reducing the size of the tree by turning some branch nodes into leaf nodes and removing the leaf nodes under the original branch. Pruning is useful because classification trees may fit the training data well but may do a poor job of classifying new values. A simpler tree often avoids over-fitting. And finally, the next process is tree selection, which is responsible for finding the smallest tree that fits the data. Usually this is the tree that yields the lowest cross-validation error [53].
The RF algorithm was conceived by combining a large number of random DTs. Each tree contributes with only one class vote for each instance, and the final classification is determined by the majority of the votes of all forest trees [54]. The trees in RF are created by drawing a subset of training data through a bagging approach. The bagging randomly selects about two-thirds of the samples from the training data to train these trees. This means that the same sample can be used in a training subset several times, while others may not be selected in a particular subset [55]. In the RF algorithm, there are two main parameters to be defined: the number of variables in the random subset at each node (mtry) and the number of trees (ntree). Rodriguez-Galiano et al. [56] conducted an empirical evaluation as to the parameter "number of trees" and reported that differences of more than a hundred trees in the classification accuracy are not meaningful; hence we opted the use of 100 trees in this work. Concerning the mtry parameter, the default value was adopted, which corresponds to the square root of the total number of features used in each experiment [57]. Other authors, however, rely on optimization procedures to assess the values of ntree and mtry, as described in [58,59]. RF has shown to own several advantages in relation to other classifiers, since it is not based on strict parametric assumptions, besides being able to handle high dimensional data and to deal with nonlinearity. However, RF has its limitations, like longer computing time and higher algorithmic complexity as compared to an individual DT [60].
The MLP is a forward-structure artificial neural network (ANN) trained by the backpropagation method, designed to map a set of input vectors to a set of output vectors [61]. ANN can be simply defined as a massively parallel distributed computational device consisting of processing units, also called neurons or nodes, which are organized in a couple of layers. The neurons are responsible for the storage of knowledge acquired within the system, which is then made available for further use [62]. MLPs learn fast with high generalization and have a strong self-learning ability [61,63]. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input, and in between those two, an arbitrary number of hidden layers (or eventually none) that are the true computational engine of the MLP. MLPs with one hidden layer are capable of approximating any continuous function. These successive layers of processing units present connections running from every unit (neuron) in one layer to every unit in the next layer. The connections are responsible for passing information throughout the network, and they are characterized by weights, which are initially set in a random way and can be positive or negative [64]. All the neurons, except those belonging to the input layer, perform two simple processing functions-receiving the signal (activation) of the neurons in the previous layer and transmitting a new signal as the input to the next layer. The weights in the network can be updated from the errors calculated for each training example, and this is called online learning. Alternatively, the errors can be saved up across all of the training examples, and the network can be updated at the end. This is called batch learning and is often more stable. The learning should be stopped when the validation set error reaches its minimum. At this very point, the net is able to attain the best generalization [62,65]. If learning is not stopped, overtraining occurs, and the performance of the net is jeopardized. Once a neural network has been trained, it can be used to make predictions.
The SVM function is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. According to [66], the possibility to maximize the margin (either side of a hyperplane that separates two data classes) and to create the largest possible distance between the separating hyperplanes has been acknowledged to reduce the upper bound of the expected generalization error. SVM supports both regression and classification tasks and can handle multiple continuous and categorical variables. Thus, SVM is primarily a classifier method that performs classification tasks by constructing hyperplanes in a multidimensional space that separates linear and nonlinear samples of different class labels [67,68]. This classifier is meant to maximize the distance between these hyperplanes and the classes samples, in which the bordering samples are called support vectors. Multiclass problems are solved by pairwise classification. There are different algorithms to train an SVM, like quadratic programming and the more efficient sequential minimal optimization (SMO), that uses heuristics to partition the training problem into smaller problems (that can be solved analytically), replaces all missing values, and transforms nominal attributes into binary ones, besides normalizing all attributes by default, aiming at minimizing an error function.
According to [31], there are approximately 132,000 hectares of soybeans and 85,000 hectares of maize in the study area. Cultivated pastures, forestlands, and shrublands are the other major classes found in the study area [29,30]. Field surveys were carried out on 10-11 September 2017 along the BR-010 and GO-118 highways with the purpose of identifying the major LULC classes present in the study area ( Figure 3). Thus, we considered the following representative LULC classes: forestlands; shrublands; grasslands; reforestations; croplands; pasturelands; bare soils/straws; urban areas; and water reservoirs.
Based on LANDSAT-8 and higher spatial resolution images available in the Google Earth and Bing platforms, 200 training samples of segments were selected for each LULC class (except for grasslands, reforestations, and water reservoirs-25 samples each because of their limited occurrence in the study area). Another set of 959 segments was selected for validation purposes, according to approaches reported by [69]. A set of 1000 random and nonstratified points designed previously for the field campaign was considered. Forty-one segments were disregarded since they were located in hilly regions affected by layover or foreshortening effects associated with the radar image acquisition processes.
Thus, seven shapefiles were generated: five for training the classifiers (5, 25, 50, 100, and 200 samples per class), one for validation, and one for a classification using all 39,254 segments generated in the segmentation step. Each classification algorithm was trained five times and the respective validations were carried out with the same set of 959 segments. The validations were performed based on the error matrices generated with the segments of each classification. The following validation metrics were used: global accuracy, Kappa index, conditional producer´s accuracy (PA), and user's accuracy (UA). Hypothesis tests were also analyzed based on the standard normal distribution to compare Kappa indices and to evaluate the performances of the different classifications. Thus, seven shapefiles were generated: five for training the classifiers (5, 25, 50, 100, and 200 samples per class), one for validation, and one for a classification using all 39,254 segments generated in the segmentation step. Each classification algorithm was trained five times and the respective validations were carried out with the same set of 959 segments. The validations were performed based on the error matrices generated with the segments of each classification. The following validation metrics were used: global accuracy, Kappa index, conditional producer´s accuracy (PA), and user´s accuracy (UA). Hypothesis tests were also analyzed based on the standard normal distribution to compare Kappa indices and to evaluate the performances of the different classifications.

Results and Discussion
Based on INMET's data analysis, we verified that there was no rainfall during the 43 days before the ALOS-2 overpass (14 May 2016). Therefore, there was likely little influence of the soil moisture and plant water contents in the SAR image considered in this study. Figure 4 shows Kappa indices according to different training sets and different classifiers, and Table 1 presents the mean and standard  [70] also verified this behavior in their experiment with different training sets. Belgiu and Dragut [55] pointed out that the RF performs accurately for studies that employ few training samples and large attribute space.

Results and Discussion
Based on INMET's data analysis, we verified that there was no rainfall during the 43 days before the ALOS-2 overpass (14 May 2016). Therefore, there was likely little influence of the soil moisture and plant water contents in the SAR image considered in this study. Figure 4 shows Kappa indices according to different training sets and different classifiers, and Table 1 presents the mean and standard deviation of these Kappa indices. Overall, they increased gradually according to the increase in the number of training samples. The MLP, RF, and SVM classifiers presented the highest Kappa indices (Kappa > 0.50), regardless of the number of samples. The maximum Kappa value was 0.68 for the SVM algorithm with 200 samples. Shiraishi et al. [70] also verified this behavior in their experiment with different training sets. Belgiu and Dragut [55] pointed out that the RF performs accurately for studies that employ few training samples and large attribute space.  The NB classifier presented a higher performance in comparison with the DT J48 classifier when the number of samples are fewer than 50, and a similar or worse performance when the number of samples are larger than 50. The NB classifier, which is widely recommended in the literature [67,68], showed a relatively low accuracy performance that was probably due to the high landscape complexity of the study area. Because of its relatively low computational costs, the NB classifier can be appropriated for inventories of large areas with more homogeneous land cover patterns. The use of several DTs-the case of RF-allowed the attainment of higher Kappa agreement indices compared to DT J48 in all training scenarios, which is in accordance with the results obtained by [55]. RF presented a similar performance for the more complex classifiers (MLP and SVM), regardless of the number of training samples. The same results were obtained by [70].  The NB classifier presented a higher performance in comparison with the DT J48 classifier when the number of samples are fewer than 50, and a similar or worse performance when the number of samples are larger than 50. The NB classifier, which is widely recommended in the literature [67,68], showed a relatively low accuracy performance that was probably due to the high landscape complexity of the study area. Because of its relatively low computational costs, the NB classifier can be appropriated for inventories of large areas with more homogeneous land cover patterns. The use of several DTs-the case of RF-allowed the attainment of higher Kappa agreement indices compared to DT J48 in all training scenarios, which is in accordance with the results obtained by [55]. RF presented a similar performance for the more complex classifiers (MLP and SVM), regardless of the number of training samples. The same results were obtained by [70]. Figure 5 shows similar performances for the RF, MLP, and SVM algorithms in terms of UA and PA. This was also the case between the NB and DT J48 algorithms. The UA performance of the NB to discriminate reforestation and natural grasslands was high, with conditional Kappa indices of 0.65 and 0.70, respectively. These results were higher than those obtained by other classifiers.
terms of biomass levels and vegetation structure (sparse shrubs over the grassland stratum); both had relatively low levels of backscattering coefficients in the L-band images. Depending on the degree of preservation of the shrub Cerrado, confusion with Cerrado shrubland often occurs due to the more prominent volumetric scattering [22,23,27]. Confusion also occurred between cultivated pastures and bare soil/straws, which was also expected because of the similar landscape conditions (low moisture content, lack of vegetation, and relatively smooth soil roughness).   Figure 7 presents the classification results obtained by the different classifiers. The NB classifier overestimated the urban areas, which can be ascribed to the presence of hilly terrain in the surrounding areas. The effects of layover and foreshortening on the ALOS-2 images from the study area probably generated some confusion in identifying urban areas. The DT J48 presented a more accurate identification of urban areas.  Cerrado shrubland and shrub Cerrado presented a high degree of confusion with the forestlands and cultivated pastures, respectively. This confusion was somehow expected, since the shrub Cerrado has similar landscape characteristics of the cultivated pastures in the study area in terms of biomass levels and vegetation structure (sparse shrubs over the grassland stratum); both had relatively low levels of backscattering coefficients in the L-band images. Depending on the degree of preservation of the shrub Cerrado, confusion with Cerrado shrubland often occurs due to the more prominent volumetric scattering [22,23,27]. Confusion also occurred between cultivated pastures and bare soil/straws, which was also expected because of the similar landscape conditions (low moisture content, lack of vegetation, and relatively smooth soil roughness). Figure 6 shows the results of the Kappa index and the global accuracy of different classifiers, involving nine LULC classes and 200 training samples. Figure 7 presents the classification results obtained by the different classifiers. The NB classifier overestimated the urban areas, which can be ascribed to the presence of hilly terrain in the surrounding areas. The effects of layover and foreshortening on the ALOS-2 images from the study area probably generated some confusion in identifying urban areas. The DT J48 presented a more accurate identification of urban areas.
Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 16 Figure 5 shows similar performances for the RF, MLP, and SVM algorithms in terms of UA and PA. This was also the case between the NB and DT J48 algorithms. The UA performance of the NB to discriminate reforestation and natural grasslands was high, with conditional Kappa indices of 0.65 and 0.70, respectively. These results were higher than those obtained by other classifiers.
Cerrado shrubland and shrub Cerrado presented a high degree of confusion with the forestlands and cultivated pastures, respectively. This confusion was somehow expected, since the shrub Cerrado has similar landscape characteristics of the cultivated pastures in the study area in terms of biomass levels and vegetation structure (sparse shrubs over the grassland stratum); both had relatively low levels of backscattering coefficients in the L-band images. Depending on the degree of preservation of the shrub Cerrado, confusion with Cerrado shrubland often occurs due to the more prominent volumetric scattering [22,23,27]. Confusion also occurred between cultivated pastures and bare soil/straws, which was also expected because of the similar landscape conditions (low moisture content, lack of vegetation, and relatively smooth soil roughness).   Figure 7 presents the classification results obtained by the different classifiers. The NB classifier overestimated the urban areas, which can be ascribed to the presence of hilly terrain in the surrounding areas. The effects of layover and foreshortening on the ALOS-2 images from the study area probably generated some confusion in identifying urban areas. The DT J48 presented a more accurate identification of urban areas.    Table 2 shows the p-values of the Z tests performed for each pair of classifiers. The p-values highlighted in bold are higher than the level of significance adopted in the test (α = 0.05), which therefore indicates classifiers with the same level of performance (H 0 : Kappa A -Kappa B = 0; and H 1 : Kappa A -Kappa B < 0; A and B = classifiers). The RF, MLP, and SVM classifiers were statistically similar in terms of Kappa agreement indices. The NB and DT J48 classifiers were also statistically similar to each other; however, they were significantly different between themselves, and both showed a lower performance in relation to RF, MLP, and SVM. Similar tests involving five LULC classes presented similar results (Table 3). The DT J48 classifier used all input layers in its classification procedure (eight polarimetric parameters, 13 decomposition components, and four polarizations). Only the metrics applied to the segment varied. In other words, the mean, standard deviation, minimum value, and all other parameters were used. In the DT J48 classification, the classes with the best classification performance were urban areas and forestlands. The node-leaf attributes were, respectively, the cross-polarization ratio (HH/HV) and the VSI index. The use of this index to separate the forest formation is logical since it estimates the volumetric scattering of forest canopies [41]. Volumetric scattering components (from van Zyl, Freeman-Durden, and Yamaguchi theorems) were also listed at the DT J48 nodes. This is a quite coherent result since these components are related to the structure of the vegetation canopy [18,20,22,23,27]. The H, A, and α components and the CSI, VSI, BMI, and RFDI indices were also listed. The cross-polarization ratios were less thoroughly involved, and the amplitude cross-polarizations (HV and VH) were used, as well.

Conclusions
The methodology used in this study proved to be feasible for mapping LULC in the Cerrado biome. According to [71], the results of the classification are considered as "good" (NB and AD J48) or "very good" (RF, MLP, and SVM). Two groups of classifiers were identified. The first group, which obtained the best results, comprises the RF, MLP, and SVM algorithms, which presented statistically similar Kappa indices. The second group, which had less accurate performances, is composed of the NB and DT J48 classifiers, which also presented statistically similar Kappa indices.
The RF classifier outperformed DT J48, which agrees with the results found in the literature [55]. DT J48 is more complex, but it did not present more accurate results in comparison with those obtained by the NB classifier. However, the high accuracy of DT J48 in discriminating urban areas can provide thematic maps with higher accuracy in cases where the urban area is the main target of classification.
As for the polarimetric attributes, we verified that the decompositions were important in the identification and classification of LULC classes. Volumetric scattering components (van Zyl, Freeman-Durden, and Yamaguchi theorems) were used in the DT J48 classification. These components are related to the structure of the vegetation canopy [18][19][20]22,23,27]. However, a more detailed study should be performed to understand the mechanisms and types of scattering that prevailed in the scene as well as the mechanisms and types that are the most important components in the classifications. The so-called incoherent parameters also played an important role, especially for CSI, VSI, BMI, and RFDI. The cross-polarization ratios were less important, however, and cross-polarization in amplitude (HV and VH) should be highlighted.
For future studies, we suggest evaluating multitemporal SAR images and polarimetric and interferometric (PolInSAR) techniques. C-band SENTINEL-1 SAR data could also be tested separately or in combination with the ALOS-2/PALSAR-2 data. Testing the use of a previous unsupervised classification based on the H-α attribute space is also possible to better understand the dominant scattering processes and, consequently, to define a better strategy for training the classifiers. There should also be a previous basis for classification with the aim of using stratified sampling for thematic validation.
Finally, in considering the complexity of the landscape in the study area and the levels of accuracy obtained, we conclude that the procedures and attributes used in this research can be extended to other types of vegetation domains.