1. Introduction
Our society is highly dependent on a functional and stable land system for food production, and to access natural resources including water, timber, fiber, ore and fuel, among other ecosystem services and goods [
1]. However, human-induced land use and land cover (LULC) changes over the past 50 years have been altering the composition, structure and services of land ecosystems at an unprecedented rate [
4] As consequences, the equilibrium between land, atmospheric and ocean systems, human welfare and wellbeing, as well as global biodiversity are at high risk [
Brazil is one of the richest biodiversity countries in the world [
7] with six unique biomes: Amazon, Atlantic Forest, Caatinga, Cerrado, Pampa and Pantanal. These biomes possess large carbon stocks in their forest [
8] and soils [
9], and additionally possess the largest global reserves of freshwater [
10]. On the other hand, this country is one of the world’s producers of agricultural commodities and it has been a major contributor to LULC changes from greenhouse gas (GHG) emissions at the global scale [
Deforestation for pasture and agriculture expansion, infrastructure development, cities, and political and financial incentives to land occupation are the main drivers of LULC changes in the Brazilian biomes, affecting biodiversity, water resources, carbon emissions, regional and local climate [
13]. Currently, the biomes undergoing more pressure on the original land cover are the largest ones, the Amazon (419 Mha, i.e., 49% of the country) and Cerrado (203 Mha, i.e., 23% of the country) [
15]. However, the Atlantic Forest (111 Mha, i.e., 13% of the country) is the Brazilian biome that suffered the most extensive LULC change in the past, dating back to colonial history in the sixteenth century [
17]. This biome is highly fragmented by roads and urban centers [
18], and immersed in a large agriculture matrix, resulting in 11.7% of old secondary forest cover (i.e., >30 years) [
19]. The semi-arid Caatinga biome (84 Mha, i.e., 10% of the country), located in the northeast region of Brazil, is considered the Brazilian biome that has been most altered by LULC change [
20], being mainly covered nowadays by secondary growth forests [
22]. Similarly to the previous ones, the Pantanal biome (15 Mha, i.e., 1.7% of the country) is likewise under high conversion pressure. Cattle ranching and sugarcane expansion are driving the suppression of its natural vegetation grassland and extensive wetlands [
24]. Last, but not least, the Pampa biome (17 Mha, i.e., 2% of the country), is located in the southernmost region of the country, comprised mostly of natural grasslands with shrub trees and rocky outcrops [
25]. Cattle ranching and agriculture have altered most of the natural grassland of this biome [
26], and it has been considered a neglected biome due to inadequate protection and conservation policies [
Spatially explicit information on the historical trajectories of LULC in Brazil is key to inform the planning and the sustainable management of natural resources, policy formulation, among other societal applications. Nevertheless, as consequence of governmental policies and funding focused on the biomes that host most of the remaining Brazilian natural vegetation under threat, maps for measuring the historical extent and intensity of LULC change often exist for the Amazon and Cerrado biomes, and are scarce and/or lack adequate spatial and temporal resolution in the other biomes. Examples of mapping efforts include the Probio project from 2002 by the Brazilian Ministry of Environment, the National Inventory from 1994, 2002 and 2010 [
30] and the Brazilian Institute of Geography and Statistics (IBGE) LULC maps from 2000, 2010, 2012, 2014, 2016 and 2018. These national mapping initiatives mostly used a combination of image pre-processing and enhancement, followed by labeling and digitizing classification based on visual interpretation, which is time consuming and prohibitively expensive for annual mapping and the reconstruction of long (i.e., >30 years) historical LULC information. Global LULC products varies from coarser to fine spatial resolution (i.e., 1 km to 30 m) satellite data and cover shorter time series intervals [
33]. Additionally, global maps have none or little involvement of local experts in the production of LULC maps, requiring further assessment by experts at the national level [
The open access of Landsat archive [
37], the new cloud computing Google Earth Engine (GEE) platform with machine learning algorithms [
39], and a network named MapBiomas (
https://mapbiomas.org/), including experts in remote sensing and computing, data science, and biomes, allowed us to reconstruct annual LULC classification at 30 m spatial resolution between 1985 and 2017. Our research network implemented image-processing algorithms in GEE to pre-process all Landsat images and normalize them to train a random forest classifier to map LULC classes of all biomes in Brazil. Massive cloud computing permitted the quick and automatic processing of a large set of images covering 33 year-long time series made of annual mosaics covering the entire extension of Brazil. The application of a cloud and cloud-shadow masks algorithms allowed to overcome Landsat scene cloudiness limitation for mapping LULC as previously reported elsewhere [
The objectives of this paper were threefold. First, we aimed at presenting how we reconstructed the annual time-series of LULC maps for all the Brazilian biomes between 1985 and 2017, by combining Landsat data, GEE, machine learning and a network of local experts, in a concept of progressively evolving LULC map collections. The second objective was to assess the extent, rates and main drivers of LULC change in the Brazilian biomes between 1985 and 2017 using the LULC time-series produced. The last objective was to present the MapBiomas image processing and classification protocol which maps the main land cover classes separately for each biome and common cross-cutting land use themes (i.e., pasture, agriculture, coastal zone, and urban infrastructure) followed by the integration of the LULC map products. We then demonstrated that the proposed protocol of MapBiomas is a step-wise learning process from local experts and feedback from users to improve the annual LULC maps. We also discuss the current applications of this free open access dataset to science, policy and monitoring LULC change in Brazil, as well as the remaining uncertainties and challenges of our LULC mapping approach.
4. Discussion
This is the first time that LULC change has been quantified in all Brazilian biomes with this degree of spatial detail (i.e., at 30 m pixel size) using +30-year time-series Landsat data. Until now, this LULC change information in Brazil was either restricted in space and time, covering a few biomes and short periods of time (e.g., [
55]), or long time-series, but focusing on deforestation in the portion of one of the biomes [
56]. Coarser spatial resolution remote sensing images have also been used to map LULC using Google Earth Engine covering all biomes in a single year [
57], and global LULC products [
31] are available with limited inputs from local experts. We did not attempt to investigate the level of spatial and temporal agreement between the MapBiomas LULC maps with the existing regional and global ones. This task is an ongoing effort of our research group, which requires the harmonization of LULC classification schemes, and spatial and temporal coherence amongst the LULC maps for undertaking the agreement analysis [
58]. A recent study conducted by another research group compared their LULC maps, produced with PROBA-V imagery at 100 m pixel size, with our MapBiomas LULC map for 2015, resulting in a 69% agreement among the most representative LULC classes (i.e., forestland, shrubland, grassland, pastureland, cropland, water body used in the PROBA-V study) [
57]. However, this study did not investigate which of these LULC products had the highest accuracy.
The LULC annual dataset presented in this study allowed numerous applications, such as the estimation of vegetation gain and loss, and the understanding of land cover drivers. Between 1985 and 2017, 38% of the Brazilian territory was modified by cattle ranching and agriculture activities, as well as infrastructure development, changing native forest and non-forest formations, indistinctly in all six biomes. Pasture expanded by 46% in the country, mainly in the Amazon and Pantanal biomes, while agriculture increased by 172%, mostly in the Atlantic Forest replacing old pastures and in the Cerrado biomes converting savanna and grasslands formations. Our LULC dataset revealed that 86 Mha of the converted native vegetation is undergoing some level of regrowth. The MapBiomas time-series also generated that, in the Amazon biome, secondary vegetation increased 12 Mha in 2017 [
59], exceeding 45.5% to the area of primary deforestation mapped by the Brazilian monitoring system (PRODES) [
60]. Thus, our LULC annual dataset goes beyond the existing LULC studies and monitoring systems and helps to fill the information and knowledge gaps in monitoring LULC dynamics in the country in the past three decades.
We built the LULC maps of this study iterating over map collections, such as that applied to MODIS global land cover products [
33]. In the MapBiomas Collection 3.1, we had substantial improvement in the random forest classifier and built a robust reference dataset for accuracy assessment. The first Collection 1.0 was mainly developed to allow our research team, engineers and data scientist to port our existing classification algorithm and optimize Google Earth Engine LULC mapping in a short time-series (i.e., between 2000 and 2016). In the Collection 2 (which evolved until Collection 2.3), we were able to move from empirical decision trees based on hierarchical rules defined by analysts to a random forest machine learning algorithm. Empirical decision rules have an advantage of better understanding the variables and rules to map LULC classes, working well with a set of small classes [
61]. However, as we increased the levels and numbers of LULC classes, empirical rules became complex, making human decision for partitioning the data into hierarchical binary classes unfeasible. To overcome this task, we adopted random forest in Collection 2.3 which evolved with unbiased training and accuracy assessment into Collection 3.1, using existing LULC maps from different sources to randomly select the training samples. Besides the iterative mapping collections, we implemented a flexible LULC mapping protocol which allows each biome to define the feature space and samples for training the random forest classifier (
Appendix S1), as long as the biome maps can follow the map integration protocol to guarantee the spatial and temporal coherence along their transitional ecotone zones.
Yet, besides the gain in information brought by MapBiomas LULC Collection 3.1, there are still challenges and limitations to be overcome. First, overall accuracy was lower than 80% in highly seasonal and heterogeneous biomes (e.g., Cerrado, Caatinga, Pantanal and Pampa). The Amazon biome, with most of the land cover comprised by forest, had the highest overall mapping accuracy of 95%. However, less predominant LULC classes had lower accuracy in all biomes. For example, the spatial variability of native vegetation types and spectral similarity among LULC classes, such as grassland and pasture, are challenging to separate [
62], even using hyperspectral images [
63]. Second, the reference dataset used to assess the mapping accuracy was built based on the visual interpretation of Landsat color composites, and ancillary spectra-temporal time-series and higher resolution imagery data (when available). We were not able to estimate the classification uncertainty of our reference dataset, which is a task in course. Third, we still need to advance in the analysis of LULC transitions and estimate its uncertainties. In this study, we limited the LULC change analysis to a +30-year period for the LULC classes that had lower classification errors (
Table 6,
Table 7 and
Table 8). Further investigation is necessary to understand the impact of LUCC classification error to estimate yearly change. Our research group is also exploring methods to understand the frequency of a pixel change to its LULC class and the number of times it happens [
64]. Fourth, we recognize that the MapBiomas LULC mapping approach is complex because it involves different data inputs and algorithm parametrization for each biome, and some classes are mapped separately as a cross-cutting theme requiring post-classification integration from multiple classification results. As a single random forest classification failed to include all LULC (
Table 2), it is likely that we will continue to use cross-cutting themes and post-classification integration of several maps and class prevalence rules to compose the final LULC. Rare classes in our classification schemes (e.g., beach and dunes, aquaculture, mining, salt flat, rocky outcrops) were penalized by the random forest classifier and tended to be under mapped, as pointed out in global mapping studies [
33]. These rare LULC classes were impacted in our accuracy assessment analysis as well, showing high classification errors. As an alternative to overcome this issue, we balanced the training classes in our random forest algorithm by using published and accessible reference LULC maps for the Amazon biome, and by adding manually sampled areas that were under mapped in the other five biomes. However, the impact of rare classes persisted, leading us to analyze in this study the LULC dynamics of only the eight most predominant classes (
Table 6). Finally, we will also attempt to separate agriculture from pasture in the mosaic class and improve the spatial and temporal consistency in some periods of time series in future MapBiomas Collections.
Several studies have already been published by the MapBiomas network to better understand the spatial-temporal LULC dynamics in Brazilian biomes focusing on specific LULC classes. For example, surface water dynamics in the Amazon region [
65], Cerrado native vegetation change [
62], mangrove [
66] and pastureland dynamics and characterization over the whole country [
67]. New research methods to explore and analyze LULC trajectories were applied to the Caatinga biome with the MapBiomas LULC time-series [
68]. The Greenhouse Gas Emission and Removal Estimating System—SEEG [
69] uses the LULC annual data to estimate the GHG emissions and removals for the land use sector in Brazil, which represents 44% of GHG emissions in 2018 (SEEG 7 available at
https://seeg.eco.br/). Indeed, reducing the LULC uncertainty of the SEEG GHG estimates was the main reason for our research group to launch the MapBiomas Project.
All products, methods and tools of the MapBiomas Project are open access, transparent and publicly available in the internet (
https://mapbiomas.org/) for non-commercial use. With open access data, it was possible to perfect the LULC maps with end-user feedback, which reached one hundred thousand users in 2019. Additionally, more than one hundred peer-reviewed research articles were published between 2017 and 2019 using the LULC maps of this project. Since the first MapBiomas Collection, the applications of this dataset keep growing in science including, for example, the assessment of conservation and biodiversity policies [
73], climate change impact [
75], and the mapping of human disease risks, including hantavirus [
76], yellow fever [
77], and Leishmania [
78]. Therefore, these LULC maps presented in this study are already contributing to better inform the scientific community, policy makers and civil society organizations. In this study, we present an in-depth methodology and the processes used to build the MapBiomas collections, and a rigorous assessment of the map accuracy, which is required to support the existing and emerging scientific and societal applications of our LULC map collections. In addition, we are also advancing in the understanding of historical LULC dynamics in the Brazilian biomes and of the main drivers of change.