Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery

Rodrigues, Thanan; Takahashi, Frederico; Dias, Arthur; Lima, Taline; Alcântara, Enner

doi:10.3390/rs17030480

Open AccessArticle

Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery

by

Thanan Rodrigues

^1,*

,

Frederico Takahashi

²

,

Arthur Dias

¹,

Taline Lima

¹

and

Enner Alcântara

³

¹

Federal Institute of Brasilia (IFB), Campus Riacho Fundo, Brasília 71805-722, DF, Brazil

²

Brazilian Institute of Geography and Statistics (IBGE), Department of Environment and Geography, State Superintendency of the Federal District, Brasília 70297-400, DF, Brazil

³

Graduate Program in Natural Disasters (Unesp/CEMADEN), São José dos Campos 12245-000, SP, Brazil

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(3), 480; https://doi.org/10.3390/rs17030480

Submission received: 12 November 2024 / Revised: 2 January 2025 / Accepted: 27 January 2025 / Published: 30 January 2025

(This article belongs to the Section Ecological Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

The Cerrado domain, one of the richest on Earth, is among the most threatened in South America due to human activities, resulting in biodiversity loss, altered fire dynamics, water pollution, and other environmental impacts. Monitoring this domain is crucial for preserving its biodiversity and ecosystem services. This study aimed to apply machine learning techniques to classify the main vegetation formations of the Cerrado within the IBGE Ecological Reserve, a protected area in Brazil, using high-resolution PlanetScope imagery from 2021 to 2024. Three machine learning methods were evaluated: Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost). A post-processing process was applied to avoid misclassification of forest in areas of savanna. After performance evaluation, the SVM method achieved the highest classification accuracy (overall accuracy of 97.51%, kappa coefficient of 0.9649) among the evaluated models. This study identified five main classes: grassland (GRA), savanna (SAV), bare soil (BS), samambaião (SAM, representing the superdominant species Pteridium esculentum), and forest (FOR). Over the three-year period (2021–2024), SAV and GRA formations were dominant in the reserve, reflecting the typical physiognomies of the Cerrado. This study successfully delineated areas occupied by the superdominant species P. esculentum, which was concentrated near gallery forests. The generated maps provide valuable insights into the vegetation dynamics within a protected area, aiding in monitoring efforts and suggesting potential new areas for protection in light of imminent anthropogenic threats. This study demonstrates the effectiveness of combining high-resolution satellite imagery with machine learning techniques for detailed vegetation mapping and monitoring in the Cerrado domain.

Keywords:

protected area; AI; Cerrado; remote sensing

Graphical Abstract

1. Introduction

The Cerrado is one of the richest terrestrial domains on Earth and possibly the most threatened tropical savanna in the South American biome [1,2]. Human activities and land use often lead to habitat fragmentation, biodiversity loss, the spread of invasive species, soil erosion, water pollution, land degradation, and altered fire regimes, resulting in the loss of numerous ecosystem services [3,4]. More than half of the Cerrado’s has been transformed into pasture, agriculture, and other uses at a fast pace [5].

The vegetation of Cerrado presents phytophysiognomies that encompass forest, grassland, and savanna formations [6]. Both grassland and savanna together cover about 85% of the Cerrado phytogeographical province [7]. From a physiognomic perspective, forest refers to areas dominated by tree species characterized by a continuous canopy. The term savanna denotes regions where trees and shrubs are scattered over a grassy layer without forming a continuous canopy. In contrast, grassland describes areas predominantly covered by herbaceous species and some shrubs, with the absence of trees in the landscape [6].

The distribution of Cerrado’s flora is influenced by latitude, the frequency of wildfires, the depth of the water table, grazing, pedology [8,9], and numerous anthropogenic factors (such as land clearing for agricultural activities, selective logging, and the use of fire for pasture management, among others) [6]. The interactions between factors and their relative importance related to the distribution of various vegetation types, especially for transitional physiognomies, remain a subject of debate [10,11]. However, studies suggest that climatic and edaphic factors are most critical [12].

Besides its distribution, the diversity of Cerrado flora is exposed to a number of invasive species, such as Melinis minutiflora P. Beauv. and Brachiaria decumbens Stapf, which are both introduced by cattle forage [13]. Species such as Pteridium esculentum (G. Forst.) Cockayne (bracken), popularly known as “samambaião”, presents an invasive behavior and is expected to change the vegetation structure of savannas [14]. This invasive behavior is also known as “superdominant”, and it refers to the intensification and unexpected proliferation of a native species, affecting the community composition, changing the environment, or altering the ecosystem processes [15].

Invasive plants are also linked to fire events, and according to [16], M. minutiflora-dominated areas can increase fire intensities more than in natural Cerrado, probably due to high biomass. Furthermore, M. minutiflora can also affect the dynamics of savanna–gallery forest boundaries, and it may limit native tree regeneration [17,18]. Fire is a common feature of the Cerrado, and it can be set by lightning or man. However, rapid human occupation has changed the natural fire regime, which has consequences for vegetation structure and composition [19]. Thus, understanding the vegetation formations in conservation areas is crucial for identifying and managing exotic or invasive species, evaluating vegetation regeneration after disturbances, and planning prescribed fires. The removal of exotic grasses can potentially trigger the natural regeneration of open Cerrado vegetation, provided that native plants are still present [20].

The Cerrado is experiencing significant land-use changes, with approximately 45% of its native vegetation already converted for anthropogenic purposes and new deforestation fronts continuing to advance into the northern region of the biome [21]. Despite this, only 3.2% of the Cerrado is under strict protection. Even within these protected areas, changes in the surrounding landscape impact biodiversity and ecosystem functions, highlighting the critical need for ongoing monitoring to ensure biodiversity conservation [22].

Accurate mapping and monitoring of Cerrado physiognomies and their converted areas are essential not only to support the selection of new conservation areas but also to promote sustainable land use practices and enhance the understanding of Cerrado dynamics, including its impacts on carbon balance, nutrient cycling, and water resources [4,11,21,23]. For this, several satellites have been used to map land use and land cover in the Cerrado domain, both in terms of formations and physiognomies level [11]; for example, a combination of Sentinel-1 and -2 to map native and non-native vegetation in the Brazilian Cerrado at the formation level with high accuracy (88.6–92.6%) was used. Landsat series was also used with satisfactory results [21,23]. However, all of them are low- to medium-spatial resolution instruments and do not deal with more detailed land use and land cover patterns [24]. To overcome this, Haddad et al. [25] used the PlanetScope constellation to classify phenological metrics of savanna physiognomies and provided promising results.

Although this example shows encouraging outcomes, the use of high-resolution spatial data is still a bottleneck since they are for commercial use. However, through Norway’s International Climate and Forest Initiative (NICFI), users can access PlanetScope imagery with high-resolution mosaics that cover the tropical forest regions from 30°N to 30°S to support the reduction of tropical forest loss, preventing climate change, biodiversity conservancy, and enable sustainable development for non-commercial uses (https://www.planet.com/nicfi/).

Given the availability of high-resolution images, combined with the use of a machine learning-based approach, it is possible to generate vegetation maps at the formation level with high accuracy. This approach has the ability to learn from data and improve results through the use of algorithms that make predictions. Thus, this research aims to explore the potential of PlanetScope imagery combined with machine learning techniques to discriminate Cerrado formations within a protected area over a time series interval of four years (2021–2024).

2. Materials and Methods

2.1. Study Site

The IBGE Ecological Reserve (Figure 1) is in the suburban area of Brasília at an elevation of 1100 m in the core region of the Cerrado [26]. The reserve encompasses various Cerrado vegetation formations, such as grasslands, cerrado sensu stricto, forests, and gallery forests [27].

The area experiences a tropical climate, characterized as hot and semi-humid, with two well-defined seasons. Annual precipitation ranges between 1500 and 1700 mm, with averages concentrated between October and March. The winter is very dry, with average minimum temperatures of 15 °C in June–July and 26 °C during the hottest period, which corresponds to the spring–summer season [26].

The species Arundo donax L., commonly known as giant reed, is found in disturbed areas, such as roadsides, landfills, and construction sites, and its dispersal is facilitated by mechanical mowing [28]. M. minutiflora, known as molasses grass, is also present in the reserve. According to Martins et al. [29], it is considered an extremely aggressive invader that competes with native flora. It can alter the original vegetation physiognomy and, due to its high biomass production, can increase the likelihood of large fires during the dry season [30]. A superabundant species in the Cerrado is the bracken fern “samambaião” (P. esculentum), which has a clumping growth habit that hinders the development of other plants [31].

2.2. Field Data

Three field campaigns were carried out in the study area: one in May 2024 and two in June 2024, both in the autumn season. A total of 57 samples were acquired, including pictures and geographic position, using a GNSS (Global Navigation Satellite System) receiver. To increase the sample size for the classification process, additional polygons were created based on PlanetScope images from other dates.

2.3. Satellite Data and Feature Space

PlanetScope data with high spatial resolutions (4.77 m per pixel) were used in classification processing. The data are freely available for the region between equatorial latitudes (30°N–30°S) and are provided by Norway’s International Climate and Forest Initiative (NICFI) Imagery Program. A surface reflectance mosaic is available monthly with spatially accurate data with minimized haze, illumination, and topographic effects. This data format has been available since September 2020, and it works with four spectral bands: blue (B), green (G), red (R), and near-infrared (NIR) [32]. The four bands were grouped together for each year, and to grant the spatial and temporal variability, images were stacked from April to September (2021 to 2024), representing the end of the wet season and the beginning of the dry season. The time interval was chosen because it represents the period in which there is greater contrast between the vegetation formations and the exposed soil. The bands provide relevant spectral information to discriminate the classes of interest.

The study site is covered by two tiles; however, only the one that covers more than 90% of the area (L15-0751E-0932N) was used. A total of 45 tiles from June 2020 to June 2024 were available, but after quality evaluation, only 39 tiles were downloaded, and they represent the composite images with the specific product named “planet_medres_normalized_analytic_year_month_mosaic”.

2.4. Machine Learning Classification Methods

In this study, the performance of three machine learning methods—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM)—was evaluated for their effectiveness in recognizing Cerrado vegetation formation. Previous research has demonstrated the potential of these methods in similar tasks [4,11,33].

The RF is a non-parametric model that aggregates multiple decision trees. It creates multiple subsets of the original training data by bootstrap sampling, where samples are drawn randomly with replacement. Bootstrap aggregating, also known as bagging, does not weigh the samples; instead, all classifiers are given the same weights during the voting [34,35,36]. Each tree is built using a randomly sampled subset of the input features, and each tree votes for the most common class. The final classification of an input vector is determined by the majority vote among all the trees [37].

XGBoost is also a supervised learning algorithm and an ensemble ML approach [38]. XGBoost is based on Gradient Boosting [39] and is projected to create efficient and robust predictive models. This process involves minimizing a loss function that quantifies the difference between the predictions from the model and the actual values. At each step, XGBoost adjusts the weights of the examples to give more emphasis to the cases that are difficult to classify [40].

SVM is a supervised, non-parametric learning method grounded in statistical learning theory [41]. The algorithm constructs a hyperplane that optimally separates data into a discrete set of predefined classes using training data [35,42]. When a hyperplane cannot be defined by linear equations, the data can be mapped into a higher-dimensional space using nonlinear functions [42]. A kernel function is then applied to identify complex decision boundaries between classes, transforming nonlinear limits in the original data space into linear ones in the high-dimensional space [43]. Kernel functions are typically categorized into four types: polynomial, linear, sigmoid, and radial basis functions (RBF). In this research, linear, polynomial, and RBF kernels were evaluated due to their effectiveness in remote sensing applications [44].

2.5. Data Processing and Classification

The main data sources used in this research were PlanetScope imagery and in situ data (points and polygons representing the land cover classes). The steps can be found in Figure 2. First, all data were combined into the same coordinate reference system (EPSG:3857). The geographical coordinate points collected in the field were used to identify the main land cover classes in the study area. After that, a visual interpretation was performed using Google Earth and PlanetScope 2021 images to support the creation of polygons representing each land cover used in the classification process. The chosen samples were randomly carried out considering the representativeness of each class and the premise that each evaluated model could be run satisfactorily. For example, land covers such as bare soil and the bracken “samambaião” have fewer pixels available for the model to learn their spectral characteristics; nevertheless, the samples provide the variability of the class. Accuracy metrics can be used to track the performance of each class in the model. Furthermore, spectral separability was also applied to observe the separability between a pair of the probability distribution of the classes, and here, the Jeffries–Matusita (JM) distance was implemented [45]. As a result, the JM distance can derive minimum values equal to zero, which represents no similarity between classes, or maximum values of 2, representing distinct classes.

The polygons were then used to train, validate, and test the models. The classes were numerically identified as integer values and followed the nomenclatures and descriptions carried out by Lewis et al. [11]. Due to the absence of more detailed field data, it was decided to work at the formation level (grassland, savanna, and forest), including two other classes, such as bare soil and the superdominant bracken, here called samambaião. 1—grassland (GRA); 2—savanna (SAV); 3—bare soil (BS); 4—samambaião (SAM); and 5—forest (FOR). GRA refers to the physiognomy of open wet grassland, grassland well drained; SAV is the typical cerrado and palm swamp; BS refers to the roads, bare soil, no vegetation areas, and buildings; SAM is the superabundant species P. esculentum; FOR is the dense Cerrado woodland, riparian forest, and gallery forest. Among the mentioned classes, FOR, in general, is invariant over time, as are SAM and BS, except if an extreme event occurs. GRA and SAV are typical classes of Cerrado and have seasonal variability throughout the year, which is mainly influenced by the precipitation.

All samples were rasterized, and for each pixel, a land cover class was assigned. The rasterization process was performed using the open-source Rasterio library version 1.3.11 in Python 3.10.12 environment. Furthermore, several other libraries (Pandas 2.2.2, Geopandas 1.0.1, NumPy 1.26.4, Seaborn 0.13.2, Matplotlib 3.8.4) were used to meet the specificities of each ML model and to deal with raster data and graphics creation. To ensure consistency in the analysis, the spectral values were normalized to the interval [0, 1], eliminating possible scale differences between the bands. Additionally, missing or anomalous values were treated, removing noise that could compromise the classification. The ML models were generally subjected to the same routine, including the following steps: (i) library installation needed for data manipulation, geospatial processing, and ML; (ii) sample division: training, validation, and testing; (iii) definition of the model’s parameters; (iv) PlanetScope image normalization; (v) model training, optimization, and evaluation; (vi) accuracy assessment; and (vii) save the archives.

A Python code was used to access the Scikit-Learn library version 1.5.0 to configure an RF method. The model was first trained with the training dataset, and then the validation dataset was used for hyperparameter tuning. The GridSearchCV (Grid Search Cross-Validation) was applied to automate this tuning process, searching and finding the best combination of hyperparameters for a given model. The model was then readjusted based on the best combination of the hyperparameters, now using the entire training dataset. Finally, the optimized model was evaluated using the test dataset, which was not used at any previous stage of training or tuning, ensuring an unbiased and realistic evaluation of the model’s performance on new data. The same routine was carried out for XGBoost and SVM methods, changing only the parameters used in the respective methods (Table 1).

2.6. Post-Classification

After classification, a post-processing step was carried out to avoid misclassification of forest. Forest areas are generally located near rivers and small isolated fragments [2]. Due to their location, this type of vegetation is known as gallery forests. According to [6], gallery forest is almost always bordered by strips of non-forest vegetation on both sides, and there is generally an abrupt transition to savanna and grassland formations. Therefore, the Normalized Difference Fraction Index (NDFI) was used to assist in creating a mask that delineates only forest areas [46]. The index is computed using the fraction images (green vegetation—GV; non-photosynthetic vegetation—NPV, soil, and shade) acquired with spectral mixture analysis. For each fraction, four representative pixels were manually collected in the 2021 images (April to September) for each band (blue, green, red, and NIR), yielding the candidate endmembers. The mean of the candidate endmembers was used to derive the fractions, and using NumPy, Rasterio, and SciPy 1.13.1 libraries, the NDFI was computed. After that, a Principal Component Analysis (PCA) was applied to reduce the dimensionality of NDFI, and the first component that retained most of the original data was used to support the construction of the forest mask. Thresholds representing the majority of forested areas (>0.4) were then selected, and isolated pixel clusters up to four pixels wide were removed from the mask. Larger clusters can be interpreted as a tree formation, but they are not necessarily natural species of the FOR class. After noise removal, the edges of the identified regions were smoothed, and the largest area was isolated, preserving important pixels to ensure that valid areas were not inadvertently discarded. To derive the final land cover images, the ML model chosen in the previous steps was applied to the mosaics from each year. After that, the forest mask was loaded into the Python pipeline, and a simple rule was implemented. The rule indicates that pixels outside the mask that were classified as FOR should be reclassified as SAV.

2.7. Accuracy Assessment

The accuracy corresponds to the final step of the model evaluation. After optimization, the test dataset was applied to each ML method, and their performances were tracked by the following metrics: confusion matrix, producer accuracy (PA), user accuracy (UA), overall accuracy (OA), and kappa coefficient. The OA represents the proportion of correctly classified instances out of the total instances in the dataset. Low percentage indicates the model is struggling and making incorrect predictions. High OA indicates that the model is performing well and correctly classifying most of the instances [47]. However, the OA does not provide detailed insights into the errors, so an additional metric is considered, such as F1-score. The F1-score gives a single metric that deals with imbalanced datasets, whereas the classes are not equally represented [48]. It relates to the concept of precision and recall, both measurements of positive predictions made by the model and the ability of the model to correctly identify all relevant instances, respectively. The selection of the most suitable model for image classification was based on the accuracy assessment results. The key criterion for choosing the model was to achieve the combination of the highest average values of F1-score and OA. The selected model was then applied to PlanetScope images considering the period between 2021 and 2024.

2.8. Land Cover Change

To visualize the dynamics of land cover, a Sankey diagram was used [49]. This tool aims to present the class transitions through a flow chart. In this study, the flows represent the percentage of area in km² from one year to the next. Four maps were used and observed in three time intervals: 2021–2022, 2022–2023, and 2023–2024. Cross-tabulation matrices were also presented to compare each of the time intervals.

3. Results

3.1. Mask Creation

The forest mask was created based on NDFI as described in the Section 2.6. After the endmembers’ selection, the NDFI was created for each image of the annual mosaic. To select the more suitable product, a PCA was performed for each year and only the first component was selected to create the mask. The explained variance for each principal component is described by the eigenvalues, and for the 2021 images, the first component presented a value of 0.8153, which means that it captures the highest part of the data variance. The remaining components explain progressively smaller proportions of the variance. The first component explains 73.12% of the total variance, becoming the most important one. The eigenvectors depict the weight or contributions of the original variables in each principal component. So, the contributions of images from April (0.4952), July (0.4466), and June (0.4192) were important for the first component. For the 2022 images, the first component presented the highest eigenvalue (0.7547), which explains 62.60% of the total variance. The variables that contributed most to this component were April (0.49), August (0.4553), and July (0.4127). For 2023, the first component yielded an eigenvalue of 0.8897, and the first component explained 65.14% of the total variance. Images from July (0.4699), June (0.4622), and August (0.4307) contributed more to the first component. For 2024, the first component presented an eigenvalue of 0.5815, which explains 49.86% of the total variance. The contributions of images from July (0.6023), April (0.5476), and August (0.5184) were important for the first component. A filtration process was then yielded, and the masks were generated (Figure 3).

3.2. Class Separability

The results obtained from the spectral separability using the JM distance yielded values between 1.9929 and 2.0000, which means that all classes have good separability from each other and can be used in the classification process using the selected feature space (B, G, R, and NIR). To depict the JM distances, a multidimensional scaling graphic [50] was created (Figure 4). It reduces the class dimension based on the JM distances for two-dimensional visualization.

3.3. ML Classification Assessment

The model was based on 2021 image data used in the classification step. The samples were represented by 239 polygons for training, 80 for validation, and 80 for testing. After rasterization, these polygons were represented in pixels, totaling 27,885 for the training set, 8635 for the validation set, and 10,056 for the test set, respectively. The number of pixels of the training, validation, and test dataset for each class was, respectively, GRA (5551, 2034, 2097), SAV (12,153, 3293, 4166), BS (1923, 594, 501), SAM (1469, 448, 575), and FOR (6789, 2266, 2717).

Considering the RF, the model was first trained using default parameters (such as max_depth: none; min_samples_split: 2; and n_estimators: 100) and then reparametrized with a combination of hyperparameters. The best combination found was max_depth: 10; min_samples_split: 5; and n_estimators: 100. According to Table 2, all the classes yielded a PA above 93%, indicating that most of the reference examples of these classes were correctly classified. Regarding the UA, all classes also showed percentages above 96%, meaning that more than 96% of pixels were correctly classified.

Taking into account the XGBoost, the combination of all parameters resulted in 192 fits. The optimized model yielded better results than the initial one. The best parameters found were learning_rate: 0.01; max_depth: 5; and n_estimators: 500. Table 3 depicts the confusion matrix and its respective accuracies. All classes yielded PA above 92% and UA above 96%.

The third approach fit three folds for each of the 54 candidates, totaling 162 fits. The optimized model was based on the best parameters combination: C: 1; degree: 2; gamma: scale; and kernel: linear. According to the confusion matrix (Table 4), all classes produced a PA above 92% and UA percentages above 96%, indicating that pixels assigned to these classes were correctly classified.

3.4. Accuracy Evaluation

After validation, the models were evaluated, and the accuracy metrics can be found below (Table 5).

To compare the kappa coefficients between the models, Z-tests considering a significance level (α) of 0.05 were conducted with the following results (Table 6).

These results indicate that all comparisons between models are not statistically significant (p > 0.05), suggesting that the performance of the models, as measured by the kappa coefficient, is not significantly different. The choice of the model was based on its performance according to the evaluated accuracy metrics. As well as the F1-score (weighted average) and OA (%). A combined analysis of these metrics was made, and the model with the highest values for all of them was chosen. Regarding Table 5, the SVM model had a slight advantage over the other model, and thus, it was selected to classify all PlanetScope images.

3.5. Land Cover from 2021 to 2024

The annual maps generated by the classification show the mainland covers found in the study area and their behavior throughout the study period. For the 2021 map (Figure 5, SAV showed dominance over the other classes, and according to Solbrig [7], this is already expected because grassland and savanna together cover about 85% of the phytogeographical province of Cerrado. The superdominant plant is also present, and it is present in the border of the forest, which, in turn, is placed along the drainage system. Patches of FOR can be observed in isolated areas, and they are represented by eucalyptus (black arrows). According to Guzmán et al. [51], eucalyptus populations can behave as exotic invaders depending on the initial stage of dispersion and colonization.

The 2022 map highlights a significant reduction in the area occupied by SAM, resulting from a frost event in May of that year. In the 2023 map, the GRA class expanded into areas previously occupied by SAV. Additionally, the SAM class returned to areas previously occupied in 2021. In the 2024 map, a slight decrease in the area occupied by SAM can be observed in the region of FOR. This occurred again due to low-temperature events during August and September in areas where SAM is already well established. GRA expanded into SAV areas to the east of the study site.

Figure 6 clearly illustrates the dominance of the SAV and GRA formations within the study area. The area of BS fluctuated depending on its detection by the classification method, which was less evident in the 2023 and 2024 maps. The SAM class showed a lower percentage in 2022 and again in 2024, which can be attributed to frost events in the region.

The Sankey diagram depicts the land cover dynamic, highlighting the flows and relative visual proportions of components [52]. The first period (Figure 7a, 2021–2022) indicates that, of the initial 100% of GRA, 51.10% remained in this class, while 48.52% was converted to SAV. As the model uses spectral variability between April and September, SAV areas contributed more expressively to this temporal composition. SAV retained 98.76% of its area within the same class. The BS class retained 49.72% of its area, with 37.92% converted to SAV and 12.20% to GRA. These two dominant classes ultimately absorbed these areas. The SAM class was converted to all classes, predominantly GRA (36.17%) and SAV (32.09%). FOR retained 79.91% of its area. In the second period (Figure 7b, 2022–2023), with the exception of the BS class, all the classes retained over 80% of their areas within their respective classes. BS, however, lost 55.51% of its area to other classes, primarily to GRA. The third period (Figure 7c, 2023–2024) highlighted the shift from BS to GRA, indicating a 62.49% loss in the BS area. The SAM class also experienced a 23.40% area loss to FOR.

4. Discussion

4.1. Cerrado Vegetation Mapping

This study highlighted the suitability of PlanetScope imagery in mapping Cerrado vegetation considering the formation level. The significant accuracy can be explained by the use of high spatial data such as the PlanetScope images. Haddad et al. [25] showed the importance of using high spatial and temporal data (PlanetScope images) to map vegetation phenology in Brazilian savannas. Acharki [53] evaluated different satellite data (Landsat-8, Sentinel-2, and PlanetScope) to map LULC in Marocco and noticed that when spatial and spectral resolution increase, classification accuracy improves. The PlanetScope provided an overall accuracy higher than 97% when compared to Landsat-8 and Sentinel-2. Lewis et al. [11] evaluated different classification arrangements and different combinations of remote sensing data to map Cerrado vegetation at formation and physiognomy levels. They also observed an improvement in the results when the input data of the classification was based on Sentinel-1 and Sentinel-2 imagery with a spatial resolution of 20 m. Classification based on Landsat-8 retrieved lower accuracy. Alencar et al. [4] applied an RF algorithm to classify Cerrado native types (forest, savanna, and grassland) and obtained an overall accuracy of 87% to 71% using Landsat imagery and the Google Earth Engine platform.

ML algorithms yielded high accuracy in the current research, with an overall accuracy of 97% and a kappa coefficient of 0.96. A similar result was observed by [54], who found an overall accuracy reaching 96% after the application of a softer validation scheme and SVM algorithm to derive a physiognomy map of Brazilian savanna vegetation. Basheer et al. [55], after evaluating different ML models to derive LULC maps in Canada, found that SVM retrieved the best result using PlanetScope (OA = 94%) instead of Landsat (OA = 89%) and Sentinel (OA = 91%). The SVM is effective in problems where classes can be linearly (or nearly linearly) separated in the input space or in a space transformed by a kernel. Its goal of maximizing the margin between classes generally leads to well-generalizable models [56]. The model is robust enough to tackle high-dimensionality problems, such as spectral data or images with several bands, once its performance depends on the number of relevant samples for the margin and not on the dimensionality of the space [57].

Research has demonstrated the improvement of ML models using different feature layers as input, such as spectral bands, spectral indices, and microwave polarization [4,11,58]. However, models based on common channels, such as the RGB and NIR bands, also provide high accuracy [56]. According to Zhang et al. [59], the increase in the number of bands is not a condition to improve the OA in a deep learning model. These findings were also observed in the current research, in which all the models achieved high accuracies.

The methodology used in the current study demonstrates that high spatial data can provide high-accuracy models to derive land cover maps based on Cerrado vegetation at a formation level. Initiatives such as the MapBiomas project (https://mapbiomas.org/) yield annual products derived from Landsat imagery at a 30 m spatial resolution depicting the Cerrado vegetation (forest, grassland, savanna); however, in certain regions, the classes are overestimated (e.g., savanna), while others are underestimated (e.g., grassland), and this is probably due to the spatial resolution. Despite this, the availability of free and continuous images allows the project to be used in monitoring programs. This is one of the limitations encountered when using PlanetScope data since the free images are limited to the duration of the NICFI program.

4.2. Dynamic of Vegetation Formations from 2021 to 2024

This study revealed the dynamic of the vegetation over four years. The superdominant bracken showed perennial behavior except when frost events happened. A meteorological station located at the study site recorded minimum temperatures below 5 °C from 19th to 22nd May 2022 [60]. By the time the annual images were captured, the P. esculentum had not yet re-emerged, which explains the absence of SAM on the 2022 map. It is worth noting that the savannas of Cerrado tend to give way to grasslands in cold conditions with frequent frosts. The lower stratum tends to recover more quickly than the upper layers. On average, it returns to pre-frost conditions after 5 months. The changes caused by frosts in Cerrado vegetation in terms of community structure are similar to those caused by fire, in which biomass is reduced and vegetation becomes more open [61].

In 2023, an increase in the GRA area was observed. According to [6], this new area occupied by GRA is actually the subtype “Cerrado Ralo”, defined as open and low vegetation, where the trees reach 2–3 m high and cover 5 to 20% of the study site and is part of the SAV formation. Decades ago, this term was known as “Campo Cerrado”; however, the vegetation presents arboreal individuals, which places it within a formation with an arboreal structure (savanna) and not herbaceous (grassland). On the other hand, in 2024, the expansion of GRA over SAV does not necessarily indicate an actual class inversion but a decrease in canopy leaf covers due to the prolonged dry season. The FOR class did not fluctuate much, highlighting the more perennial nature of its floristic composition.

The distribution of these vegetation formations (GRA, SAV, and FOR) is related to factors such as soil nutrients, water availability, geomorphology, and fire dynamics [11]. According to Miranda et al. [62], the production of biomass during the wet season dries as the dry season comes, favoring the occurrence of fire. The fire is superficial and consumes basically a fine fuel of the herbaceous layer, which is resistant to fire resprouting right away. The Cerrado is subject to wildfires, which occur naturally after the summer when lightning events are common. However, anthropogenic events also happen, and they cause severe damage to the vegetation [16]. The potential of fire increases with the presence of the highly flammable grass M. minutiflora, whose areas act as corridors of propagation. For the study period, no fire events were recorded at the study site [16].

M. minutiflora is one of the most common invasive grasses in areas of Cerrado [63]. It can be found in borders, roadsides, disturbed areas, grasslands, and also in shady habitats on the edge and inside woods, forests, and savannas [13,64]. In the study area, the respective species was also observed; however, the classification method was not efficient in distinguishing it from other plants, only the most common superdominant plant, P. esculentum, which has an invasive behavior [65]. Other invasive plants were also observed in the field, such as the Pinus genus, which is considered the most aggressive invader in many natural environments, preferentially in open and humid areas [66]. It impacts the ecosystem properties and functions, altering the interactions and trophic behavior of native environments [67]. Its invasion was associated with proximity to a seed source and wind dispersion [68,69]. The main vegetation observed at the study site is available below (Figure 8).

5. Conclusions

This study demonstrated the effectiveness of using PlanetScope imagery combined with machine learning techniques to discriminate Cerrado formations and to map areas occupied by superdominant species within a protected area. The results indicated that the SVM method achieved the highest classification accuracy (OA: 97.51%; kappa coefficient: 0.9649); however, it was not significantly different from the other tested models. Additionally, the classes that dominated the landscape of the study site were SAV and GRA. The superdominant species, P. esculentum, was concentrated near the gallery forest, acting as a containment belt for the forest vegetation. Despite being decimated by a frost event, P. esculentum rapidly recovered within a few months, reclaiming its usual territory. The use of PlanetScope imagery allowed for the creation of more detailed maps of Cerrado formations, providing valuable insights for the improved management of this biome and its protected areas, which are increasingly threatened by surrounding land use pressures. The identification of Melinis patches can be achieved using sensors with higher spatial resolution, such as Remotely Piloted Aircraft (RPA). In this context, future research aims to utilize these sensors to observe the dynamics of this plant in the conservation of the Cerrado within protected areas. Furthermore, this study establishes an initial framework for applying machine learning to PlanetScope images for vegetation classification. Future work can build on this foundation by exploring new techniques for data processing, sensor integration, and adapting models to different environmental scenarios.

Author Contributions

Conceptualization, T.R.; methodology, T.R.; software, T.R.; validation, T.R.; formal analysis, T.R.; investigation, T.R., F.T., A.D. and T.L.; resources, T.R.; data curation, T.R.; writing—original draft preparation, T.R., F.T. and E.A.; writing—review and editing, T.R., F.T., A.D., T.L. and E.A.; visualization, T.R.; supervision, T.R.; project administration, T.R.; funding acquisition, T.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Foundation for Research Support of the Federal District (FAP/DF), grant number project 00193-00002228/2023-82.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors are grateful to the IBGE Ecological Reserve for the research authorization (project code PC 109). The satellite data have been provided under the NICFI Satellite Data Program. We also thank the reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Mittermeier, R.A.; Myers, N.; Mittermeier, C.G.; Robles Gil, P. (Eds.) Hotspots: Earth’s Biologically Riches and Most Endangered Terrestrial Ecoregions, 1st ed.; CEMEX, Conservation International: Mexico City, Mexico, 1999; ISBN 978-968-6397-58-1. [Google Scholar]
Silva, J.M.C.; Bates, J.M. Biogeographic Patterns and Conservation in the South American Cerrado: A Tropical Savanna Hotspot. BioScience 2002, 52, 225. [Google Scholar] [CrossRef]
Klink, C.A.; Machado, R.B. Conservation of the Brazilian Cerrado. Conserv. Biol. 2005, 19, 707–713. [Google Scholar] [CrossRef]
Alencar, A.; Shimbo, J.Z.; Lenti, F.; Balzani Marques, C.; Zimbres, B.; Rosa, M.; Arruda, V.; Castro, I.; Fernandes Márcico Ribeiro, J.; Varela, V.; et al. Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed in the Google Earth Engine Platform. Remote Sens. 2020, 12, 924. [Google Scholar] [CrossRef]
Klink, C.A.; Moreira, A.G. Past and Current Human Occupation, and Land Use. In The Cerrados of Brazil: Ecology and Natural History of a Neotropical Savanna; Columbia University Press: New York, NY, USA, 2002; pp. 69–88. [Google Scholar]
Ribeiro, J.F.; Walter, B.M.T. Fitofisionomias do bioma Cerrado. In Cerrado: Ambiente e Flora; Sano, S.M., Almeida, S.P., Eds.; Embrapa-CPAC: Planaltina, DF, Brazil, 1998; Cap. 3; pp. 87–166. [Google Scholar]
Solbrig, O.T. The Diversity of the Savanna Ecosystm. In Biodiversity and Savanna Ecosystem Processes; Solbrig, O.T., Medina, E., Silva, J.F., Eds.; Ecological Studies; Springer: Berlin/Heidelberg, Germany, 1996; Volume 121, pp. 31–41. ISBN 978-3-642-78971-7. [Google Scholar]
Neri, A.V.; Schaefer, C.E.G.R.; Souza, A.L.; Ferreira-Junior, W.G.; Meira-Neto, J.A.A. Pedology and Plant Physiognomies in the Cerrado, Brazil. An. Acad. Bras. Ciênc. 2013, 85, 87–102. [Google Scholar] [CrossRef]
Neves, S.P.S.; Funch, R.; Conceição, A.A.; Miranda, L.A.P.; Funch, L.S. What Are the Most Important Factors Determining Different Vegetation Types in the Chapada Diamantina, Brazil? Braz. J. Biol. 2016, 76, 315–333. [Google Scholar] [CrossRef]
Bueno, M.L.; Dexter, K.G.; Pennington, R.T.; Pontara, V.; Neves, D.M.; Ratter, J.A.; De Oliveira-Filho, A.T. The Environmental Triangle of the Cerrado Domain: Ecological Factors Driving Shifts in Tree Species Composition between Forests and Savannas. J. Ecol. 2018, 106, 2109–2120. [Google Scholar] [CrossRef]
Lewis, K.; De Van Barros, F.; Cure, M.B.; Davies, C.A.; Furtado, M.N.; Hill, T.C.; Hirota, M.; Martins, D.L.; Mazzochini, G.G.; Mitchard, E.T.A.; et al. Mapping Native and Non-Native Vegetation in the Brazilian Cerrado Using Freely Available Satellite Products. Sci. Rep. 2022, 12, 1588. [Google Scholar] [CrossRef]
Lehmann, C.E.R.; Anderson, T.M.; Sankaran, M.; Higgins, S.I.; Archibald, S.; Hoffmann, W.A.; Hanan, N.P.; Williams, R.J.; Fensham, R.J.; Felfili, J.; et al. Savanna Vegetation-Fire-Climate Relationships Differ Among Continents. Science 2014, 343, 548–552. [Google Scholar] [CrossRef]
Pivello, V.R.; Shida, C.N.; Meirelles, S.T. Alien Grasses in Brazilian Savannas: A Threat to the Biodiversity. Biodivers. Conserv. 1999, 8, 1281–1294. [Google Scholar] [CrossRef]
Miatto, R.C.; Silva, I.A.; Silva-Matos, D.M.; Marrs, R.H. Woody Vegetation Structure of Brazilian Cerrado Invaded by Pteridium arachnoideum (Kaulf.) Maxon (Dennstaedtiaceae). Flora Morphol. Distrib. Funct. Ecol. Plants 2011, 206, 757–762. [Google Scholar] [CrossRef]
Pivello, V.R.; Vieira, M.V.; Grombone-Guaratini, M.T.; Matos, D.M.S. Thinking about Super-Dominant Populations of Native Species—Examples from Brazil. Perspect. Ecol. Conserv. 2018, 16, 74–82. [Google Scholar] [CrossRef]
Mistry, J.; Berardi, A. Assessing Fire Potential in a Brazilian Savanna Nature Reserve. Biotropica 2005, 37, 439–451. [Google Scholar] [CrossRef]
Hoffmann, W.A.; Lucatelli, V.M.P.C.; Silva, F.J.; Azeuedo, I.N.C.; Marinho, M.D.S.; Albuquerque, A.M.S.; Lopes, A.D.O.; Moreira, S.P. Impact of the Invasive Alien Grass Melinis minutiflora at the Savanna-forest Ecotone in the Brazilian Cerrado. Divers. Distrib. 2004, 10, 99–103. [Google Scholar] [CrossRef]
Hoffmann, W.A.; Haridasan, M. The Invasive Grass, Melinis minutiflora, Inhibits Tree Regeneration in a Neotropical Savanna. Austral Ecol. 2008, 33, 29–36. [Google Scholar] [CrossRef]
Miranda, H.S.; Bustamente, M.M.C.; Miranda, A.C. The fire factor. In The Cerrados of Brazil: Ecology and Natural History of a Neotropical Savanna; Oliveira, P.S., Marquis, R.J., Eds.; Columbia University Press: New York, NY, USA, 2002; pp. 51–68. [Google Scholar]
Assis, G.B.; Pilon, N.A.L.; Siqueira, M.F.; Durigan, G. Effectiveness and Costs of Invasive Species Control Using Different Techniques to Restore Cerrado Grasslands. Restor. Ecol. 2021, 29, e13219. [Google Scholar] [CrossRef]
Souza, C.M.; Shimbo, J.Z.; Rosa, M.R.; Parente, L.L.; Alencar, A.A.; Rudorff, B.F.T.; Hasenack, H.; Matsumoto, M.; Ferreira, L.G.; Souza-Filho, P.W.M.; et al. Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine. Remote Sens. 2020, 12, 2735. [Google Scholar] [CrossRef]
Françoso, R.D.; Brandão, R.; Nogueira, C.C.; Salmona, Y.B.; Machado, R.B.; Colli, G.R. Habitat Loss and the Effectiveness of Protected Areas in the Cerrado Biodiversity Hotspot. Nat. Conserv. 2015, 13, 35–40. [Google Scholar] [CrossRef]
Sano, E.E.; Rosa, R.; Brito, J.L.S.; Ferreira, L.G. Land Cover Mapping of the Tropical Savanna Region in Brazil. Environ. Monit. Assess. 2010, 166, 113–124. [Google Scholar] [CrossRef]
Fonseca, L.M.G.; Körting, T.S.; Bendini, H.D.N.; Girolamo-Neto, C.D.; Neves, A.K.; Soares, A.R.; Taquary, E.C.; Maretto, R.V. Pattern Recognition and Remote Sensing Techniques Applied to Land Use and Land Cover Mapping in the Brazilian Savannah. Pattern Recognit. Lett. 2021, 148, 54–60. [Google Scholar] [CrossRef]
Haddad, I.; Galvão, L.S.; Breunig, F.M.; Dalagnol, R.; Bourscheidt, V.; Jacon, A.D. On the Combined Use of Phenological Metrics Derived from Different PlanetScope Vegetation Indices for Classifying Savannas in Brazil. Remote Sens. Appl. Soc. Environ. 2022, 26, 100764. [Google Scholar] [CrossRef]
FIBGE—Fundação Instituto Brasileiro de Geografia e Estatística. Zoneamento Ambiental da Bacia do Córrego Taquara—Distrito Federal; Versão Preliminar: Rio de Janeiro, Brazil, 1995; Volume I.
UNESCO. Subsídios ao Zoneamento da APA Gama-Cabeça de Veado e Reserva da Biosfera do Cerrado: Caracterização e Conflitos Socioambientais; UNESCO, MAB, Reserva da Biosfera do Cerrado: Brasília, Brazil, 2003. [Google Scholar]
Simões, K.C.C.; Hay, J.D.V.; de Andrade, C.O.; de Carvalho, O.A., Jr.; Gomes, R.A.T. Distribuição de Cana-do-Reino (Arundo donax L.) no Distrito Federal; Biodiversidade Brasileira-BioBrasil: Vinhedo, Brazil, 2013; p. 2.
Martins, C.R.; Hay, J.D.V.; Valls, J.F.; Leite, L.L.; Henriques, R.P.B. Study on alien gramineous of the Brasilia National Park, Federal District, Brazil. Nat. Conserv. 2007, 5, 93–100. [Google Scholar]
D’antonio, C.M.; Vitouek, P.M. Biological invasions by exotics grasses, the grass/fire, and goal change. Annu. Rev. Ecol. Evol. Syst. 1992, 23, 63–87. [Google Scholar] [CrossRef]
Oliveira, V.M.; Schwartsburd, P.B.; Brighenti, A.M.; D’oliveira, P.S.; Miranda, J.E.C. Plantas tóxicas em pastagens: Samambaia-do-campo (Pteridium esculentum subsp. Archnoideum (kaulf) Thomson, Família Dennstaedtiaceae). In Comunicado Técnico; EMBRAPA: Brasilia, Brazil, 2018. [Google Scholar]
Norway’s International Climate and Forests Initiative (NICFI). NICFI Satellite Data Program User Guide; Norway’s International Climate and Forests Initiative (NICFI): Oslo, Norway, 2022. [Google Scholar]
Bueno, I.T.; Acerbi Júnior, F.W.; Silveira, E.M.O.; Mello, J.M.; Carvalho, L.M.T.; Gomide, L.R.; Withey, K.; Scolforo, J.R.S. Object-Based Change Detection in the Cerrado Biome Using Landsat Time Series. Remote Sens. 2019, 11, 570. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Jafarzadeh, H.; Mahdianpari, M.; Gill, E.; Mohammadimanesh, F.; Homayouni, S. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sens. 2021, 13, 4405. [Google Scholar] [CrossRef]
Breiman, L. Using Adaptive Bagging to Debias Regressions; Technical Report 547; Statistics Dept, UCB: Brussels, Belgium, 1999. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chemura, A.; Rwasoka, D.; Mutanga, O.; Dube, T.; Mushore, T. The Impact of Land-Use/Land Cover Changes on Water Balance of the Heterogeneous Buzi Sub-Catchment, Zimbabwe. Remote Sens. Appl. Soc. Environ. 2020, 18, 100292. [Google Scholar] [CrossRef]
Vapnik, V.N. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 1995; p. 314. [Google Scholar]
Kavzoglu, T.; Colkesen, I. A Kernel Functions Analysis for Support Vector Machines for Land Cover Classification. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 352–359. [Google Scholar] [CrossRef]
Huang, C.; Davis, L.S.; Townshend, J.R.G. An Assessment of Support Vector Machines for Land Cover Classification. Int. J. Remote Sens. 2002, 23, 725–749. [Google Scholar] [CrossRef]
Pouteau, R.; Meyer, J.-Y.; Taputuarai, R.; Stoll, B. Support Vector Machines to Map Rare and Endangered Native Plants in Pacific Islands Forests. Ecol. Inform. 2012, 9, 37–46. [Google Scholar] [CrossRef]
Kulkarni, K.; Vijaya, P.A. Separability analysis of the band combinations for land cover classification of satellite images. Int. J. Eng. Trends Technol. 2021, 69, 138–144. [Google Scholar] [CrossRef]
Souza, C.M.; Roberts, D.A.; Cochrane, M.A. Combining Spectral and Spatial Information to Map Canopy Damage from Selective Logging and Forest Fires. Remote Sens. Environ. 2005, 98, 329–343. [Google Scholar] [CrossRef]
Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
Sasaki, Y. The truth of the F-measure. Teach. Tutor Mater. 2007, 1, 1–5. [Google Scholar]
Cuba, N. Research Note: Sankey Diagrams for Visualizing Land Cover Dynamics. Landsc. Urban Plan. 2015, 139, 163–167. [Google Scholar] [CrossRef]
Buja, A.; Swayne, D.F.; Littman, M.L.; Dean, N.; Hofmann, H.; Chen, L. Data Visualization with Multidimensional Scaling. J. Comput. Graph. Stat. 2008, 17, 444–472. [Google Scholar] [CrossRef]
Guzmán, D.M.; Drummond, S.A.; Barreto, J.G. Undesirable Neighbours: Eucalyptus and Protected Areas. In Protected Area Management—Recent Advances; Nazip Suratman, M., Ed.; IntechOpen: London, UK, 2022; ISBN 978-1-83969-812-5. [Google Scholar]
Liu, X.; Fu, D.; Zevenbergen, C.; Busker, T.; Yu, M. Assessing Sponge Cities Performance at City Scale Using Remotely Sensed LULC Changes: Case Study Nanjing. Remote Sens. 2021, 13, 580. [Google Scholar] [CrossRef]
Acharki, S. PlanetScope contributions compared to Sentinel-2, and Landsat-8 for LULC mapping. Remote Sens. Appl. Soc. Environ. 2022, 27, 100774. [Google Scholar] [CrossRef]
Schwieder, M.; Leitão, P.J.; Da Cunha Bustamante, M.M.; Ferreira, L.G.; Rabe, A.; Hostert, P. Mapping Brazilian Savanna Vegetation Gradients with Landsat Time Series. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 361–370. [Google Scholar] [CrossRef]
Basheer, S.; Wang, X.; Farooque, A.A.; Nawaz, R.A.; Liu, K.; Adekanmbi, T.; Liu, S. Comparison of Land Use Land Cover Classifiers Using Different Satellite Imagery and Machine Learning Techniques. Remote Sens. 2022, 14, 4978. [Google Scholar] [CrossRef]
Chachondhia, P.; Shakya, A.; Kumar, G. Performance Evaluation of Machine Learning Algorithms Using Optical and Microwave Data for LULC Classification. Remote Sens. Appl. Soc. Environ. 2021, 23, 100599. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Kerle, N.; Kuffer, M.; Ghaffarian, S. Post-Disaster Recovery Assessment with Machine Learning-Derived Land Cover and Land Use Information. Remote Sens. 2019, 11, 1174. [Google Scholar] [CrossRef]
Camargo, F.F.; Sano, E.E.; Almeida, C.M.; Mura, J.C.; Almeida, T. A Comparative Assessment of Machine-Learning Techniques for Land Use and Land Cover Classification of the Brazilian Tropical Savanna Using ALOS-2/PALSAR-2 Polarimetric Images. Remote Sens. 2019, 11, 1600. [Google Scholar] [CrossRef]
Zhang, P.; Ke, Y.; Zhang, Z.; Wang, M.; Li, P.; Zhang, S. Urban Land Use and Land Cover Classification Using Novel Deep Learning Models Based on High Spatial Resolution Satellite Imagery. Sensors 2018, 18, 3717. [Google Scholar] [CrossRef]
Angelita, C.; Betânia, G.; Ionaí, M.; Mariza, P.; Leonardo, B. Frost Damage in a Lobelia Brasiliensis (Campanulaceae) Population at Reserva Ecologica Do Ibge, Brasilia –Federal District, Brazil. Rev. Biol. Neotrop. J. Neotrop. Biol. 2023, 20, 44–49. [Google Scholar]
Brando, P.M.; Durigan, G. Changes in Cerrado Vegetation after Disturbance by Frost (São Paulo State, Brazil). Plant Ecol. 2004, 175, 205–215. [Google Scholar] [CrossRef]
Miranda, H.S.; Sato, M.N.; Neto, W.N.; Aires, F.S. Fires in the Cerrado, the Brazilian Savanna. In Tropical Fire Ecology; Springer: Berlin/Heidelberg, Germany, 2009; pp. 427–450. ISBN 978-3-540-77380-1. [Google Scholar]
Tambosi, L.R.; Barbosa, E.G. Uso de modelos de nicho ecológico, gerados em escala local, para identificação de áreas suscetíveis à invasão de gramíneas africanas em uma reserva de cerrado do estado de São Paulo. In Proceedings of the Anias XIV Simpósio Brasileiro de Sensoriamento Remoto, Natal, Brazil, 25–30 April 2009; INPE: Natal, Brazil, 2009; pp. 3111–3118. [Google Scholar]
Sciamarelli, A.; Guglieri-Caporal, A.; Caporal, F.J.M. Prediction for expansion of two invasive grasses in Mato Grosso do Sul, Brazil, using climatic data and NDVI/MODIS. Número Espec. 2011, 36, 98–106. [Google Scholar]
Hojo-Souza, N.S.; Carneiro, C.M.; Santos, R.C.d. Pteridium aquilinum: O que sabemos e o que ainda falta saber. Biosci. J. 2010, 26, 798–808. [Google Scholar]
Zalba, S.M.; Cuevas, Y.A.; Boó, R.M. Invasion of Pinus halepensis Mill. following a wildfire in an Argentine grassland nature reserve. J. Environ. Manag. 2008, 88, 539–546. [Google Scholar] [CrossRef]
Chen, C.R.; Condron, L.M.; Xu, Z.H. Impacts of grassland afforestation with coniferous trees on soil phosphorus dynamics and associated microbial processes: A review. For. Ecol. Manag. 2008, 255, 396–409. [Google Scholar] [CrossRef]
Rejmanék, M.; Richardson, D.M. What attributes make some plant species more invasive? Ecology 1996, 77, 1655–1661. [Google Scholar] [CrossRef]
Zanchetta, D.; Diniz, F.V. Estudo da contaminação biológica por Pinus spp. em três diferentes áreas na estação ecológica de Itirapina (SP, BRASIL). Rev. Inst. Florest. 2006, 18, 1–14. [Google Scholar] [CrossRef]

Figure 1. Map of the IBGE Ecological Reserve using PlanetScope image from April 2024. The pictures were collected in the field campaigns and represent (a) M. minutiflora, (b) Trembleya parviflora, and (c) P. esculentum.

Figure 2. Methodological steps to derive PlanetScope classified data.

Figure 4. Multidimensional scaling graphic to depict the JM distance between the classes (class 1: GRA; class 2: SAV; class 3: BS; class 4: SAM; and class 5: FOR).

Figure 5. Land cover from 2021 to 2024. The black arrows indicate the location of eucalyptus patches.

Figure 6. Area (km²) of each class along the interval between 2021 and 2024. GRA: grassland, SAV: savanna, BS: bare soil, SAM: samambaião and FOR: forest.

Figure 7. Sankey diagrams highlighting surface cover changes between 2021 and 2024. GRA: grassland, SAV: savanna, BS: bare soil, SAM: samambaião and FOR: forest.

Figure 8. Main vegetation features found at the study site. (a) Gallery forest and M. minutiflora (FOR); (b) SAV; (c) M. minutiflora; (d) Pinus in the horizon.

Table 1. Hyperparameters tuning for RF, XGBoost, and SVM methods.

Algorithm	Parameter	Values
RF	n_estimators	100, 300, 500
	max_depth	10, 20, 30
	min_samples_split	2, 5, 10
XGBoost	n_estimators	100, 300, 500, 700
	max_depth	3, 5, 7, 9
	learning_rate	0.01, 0.1, 0.2, 0,3
SVM	C	0.1, 1, 10
	kernel	linear, poly, rbf
	degree	2, 3, 4
	gamma	scale, auto

Table 2. RF accuracy after optimization.

Class Name	GRA	SAV	BS	SAM	FOR	Total	UA
GRA	2061	0	36	0	0	2097	0.98
SAV	166	3994	4	0	2	4166	0.96
BS	0	2	499	0	0	501	1.00
SAM	0	9	0	555	11	575	0.97
FOR	0	24	0	11	2682	2717	0.99
Total	2227	4029	539	566	2695	10,056
PA	0.93	0.99	0.93	0.98	1.00

Table 3. XGBoost accuracy after optimization.

Class Name	GRA	SAV	BS	SAM	FOR	Total	UA
GRA	2057	1	39	0	0	2097	0.98
SAV	169	3986	4	0	7	4166	0.96
BS	0	1	500	0	0	501	1.00
SAM	0	9	0	558	8	575	0.97
FOR	0	25	0	16	2676	2717	0.98
Total	2226	4022	543	574	2691	10,056
PA	0.92	0.99	0.92	0.97	0.99

Table 4. SVM accuracy after optimization.

Class Name	GRA	SAV	BS	SAM	FOR	Total	UA
GRA	2093	0	4	0	0	2097	1.00
SAV	183	3983	0	0	0	4166	0.96
BS	0	5	496	0	0	501	0.99
SAM	0	9	0	558	8	575	0.97
FOR	0	29	0	12	2676	2717	0.98
Total	2276	4026	500	570	2684	10,056
PA	0.92	0.99	0.99	0.98	1.00

Table 5. Accuracy metrics for each ML method: F1-score weighted average, OA, and kappa coefficient values.

Algorithm	F1-Score (Weighted Avg)	OA (%)	kappa
RF	0.97	97.3648	0.9629
XGBoost	0.97	97.2255	0.9609
SVM	0.98	97.5139	0.9649

Table 6. Results of pairwise Z-tests.

Comparison	Z	p-Value
RF vs. XGBoost:	0.6204	0.5350
RF vs. SVM	−0.6369	0.5242
XGBoost vs. SVM	1.2576	0.2086

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rodrigues, T.; Takahashi, F.; Dias, A.; Lima, T.; Alcântara, E. Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery. Remote Sens. 2025, 17, 480. https://doi.org/10.3390/rs17030480

AMA Style

Rodrigues T, Takahashi F, Dias A, Lima T, Alcântara E. Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery. Remote Sensing. 2025; 17(3):480. https://doi.org/10.3390/rs17030480

Chicago/Turabian Style

Rodrigues, Thanan, Frederico Takahashi, Arthur Dias, Taline Lima, and Enner Alcântara. 2025. "Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery" Remote Sensing 17, no. 3: 480. https://doi.org/10.3390/rs17030480

APA Style

Rodrigues, T., Takahashi, F., Dias, A., Lima, T., & Alcântara, E. (2025). Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery. Remote Sensing, 17(3), 480. https://doi.org/10.3390/rs17030480

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Cerrado Land Cover Classification Using PlanetScope Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Site

2.2. Field Data

2.3. Satellite Data and Feature Space

2.4. Machine Learning Classification Methods

2.5. Data Processing and Classification

2.6. Post-Classification

2.7. Accuracy Assessment

2.8. Land Cover Change

3. Results

3.1. Mask Creation

3.2. Class Separability

3.3. ML Classification Assessment

3.4. Accuracy Evaluation

3.5. Land Cover from 2021 to 2024

4. Discussion

4.1. Cerrado Vegetation Mapping

4.2. Dynamic of Vegetation Formations from 2021 to 2024

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI