Maximum Fraction Images Derived from Year-Based Project for On-Board Autonomy-Vegetation (PROBA-V) Data for the Rapid Assessment of Land Use and Land Cover Areas in Mato Grosso State, Brazil

: This paper presents a new approach for rapidly assessing the extent of land use and land cover (LULC) areas in Mato Grosso state, Brazil. The novel idea is the use of an annual time series of fraction images derived from the linear spectral mixing model (LSMM) instead of original bands. The LSMM was applied to the Project for On-Board Autonomy-Vegetation (PROBA-V) 100-m data composites from 2015 (~73 scenes / year, cloud-free images, in theory), generating vegetation, soil, and shade fraction images. These fraction images highlight the LULC components inside the pixels. The other new idea is to reduce these time series to only six single bands representing the maximum and standard deviation values of these fraction images in an annual composite, reducing the volume of data to classify the main LULC classes. The whole image classiﬁcation process was conducted in the Google Earth Engine platform using the pixel-based random forest algorithm. A set of 622 samples of each LULC class was collected by visual inspection of PROBA-V and Landsat-8 Operational Land Imager (OLI) images and divided into training and validation datasets. The performance of the method was evaluated by the overall accuracy and confusion matrix. The overall accuracy was 92.4%, with the lowest misclassiﬁcation found for cropland and forestland ( < 9% error). The same validation data set showed 88% agreement with the LULC map made available by the Landsat-based MapBiomas project. This proposed method has the potential to be used operationally to accurately map the main LULC areas and to rapidly use the PROBA-V dataset at regional or national levels.


Introduction
The Mato Grosso state presents the second largest area of deforestation and forest degradation in the Brazilian Legal Amazonia (BLA), a Brazilian political division that encompasses the states of Acre, Amapá, Amazonas, Mato Grosso, Pará, Rondônia, Roraima, Tocantins, and part of Maranhão [1,2]. Mato Grosso is the main grain and cattle beef producer in the BLA [3,4]. Its northern part is covered by the Amazonia biome, while the southern part is covered by the Cerrado and Pantanal biomes. This state has a key role in understanding the Brazilian land use and land cover (LULC) dynamics, since several processes of change in LULC that are occurring in Mato Grosso are being repeated in other Brazilian states covered by the Amazonia and Cerrado biomes. These are the cases of the states of Pará, Land 2020, 9, 139 3 of 20 Annual or monthly composite images based on the maximum vegetation fraction values ensure that cloudy and off-nadir pixels have low chances to be selected in the final composites, as stated by [30] and allow better characterization of LULC classes with strong seasonality (e.g., tropical savannas, croplands, and pasturelands), as shown by [34]. Since different LULC classes in Mato Grosso state present distinct annual behavior in terms of the proportion of vegetation, soil, and shade, in theory, the maximum fraction images (and their corresponding standard deviations) have the potential to better discriminate the most representative LULC classes from this region (forestlands, shrublands, grasslands, rain-fed croplands, and pasturelands). In this context, this study aimed to present a new method to map the major LULC classes in Mato Grosso state based on the maximum value proportional fractions of vegetation, soil, and shade derived from the 2015 PROBA-V 100-m images.

Study Area
The study area corresponded to Mato Grosso state, located in midwestern Brazil in the BLA region. This area covers approximately 900,000 km 2 , is characterized by a heterogeneous landscape (highlands, lowlands, wetlands, depressions, and plateaus), and encompasses three Brazilian biomes: Amazonia (tropical rainforest, 56.7% of the state), Cerrado (tropical savanna, 37.4%), and Pantanal (wetland, 5.9%) ( Figure 1) [35]. Dense and open ombrophilous forests are found in the northern part of the state (Amazonia); savannas with diverse proportions of grasses, shrubs, and trees are found in the plateaus and lowlands in the central and southern parts of the state (Cerrado); and wetlands are found in the southwestern part of the state (Pantanal) [35]. These heterogeneous natural and anthropogenic landscapes make the LULC classification quite challenging. In addition, Mato Grosso is the largest producer and exporter of soybeans, corn, and cotton in Brazil [36]. There are also extensive areas of pasture, reforestation by Eucalyptus, and sparsely distributed forest logging [3,5,37,38].
Land 2020, 9, x FOR PEER REVIEW 3 of 20 [30] and allow better characterization of LULC classes with strong seasonality (e.g., tropical savannas, croplands, and pasturelands), as shown by [34]. Since different LULC classes in Mato Grosso state present distinct annual behavior in terms of the proportion of vegetation, soil, and shade, in theory, the maximum fraction images (and their corresponding standard deviations) have the potential to better discriminate the most representative LULC classes from this region (forestlands, shrublands, grasslands, rain-fed croplands, and pasturelands). In this context, this study aimed to present a new method to map the major LULC classes in Mato Grosso state based on the maximum value proportional fractions of vegetation, soil, and shade derived from the 2015 PROBA-V 100-m images.

Study Area
The study area corresponded to Mato Grosso state, located in midwestern Brazil in the BLA region. This area covers approximately 900,000 km², is characterized by a heterogeneous landscape (highlands, lowlands, wetlands, depressions, and plateaus), and encompasses three Brazilian biomes: Amazonia (tropical rainforest, 56.7% of the state), Cerrado (tropical savanna, 37.4%), and Pantanal (wetland, 5.9%) ( Figure 1) [35]. Dense and open ombrophilous forests are found in the northern part of the state (Amazonia); savannas with diverse proportions of grasses, shrubs, and trees are found in the plateaus and lowlands in the central and southern parts of the state (Cerrado); and wetlands are found in the southwestern part of the state (Pantanal) [35]. These heterogeneous natural and anthropogenic landscapes make the LULC classification quite challenging. In addition, Mato Grosso is the largest producer and exporter of soybeans, corn, and cotton in Brazil [36]. There are also extensive areas of pasture, reforestation by Eucalyptus, and sparsely distributed forest logging [3,5,37,38].

Datasets
This study was based on the PROBA-V Collection 1 (C1) top of canopy daily synthesis product obtained in the entire year of 2015. PROBA-V C1 images are produced every five days with a spatial resolution of 100 m at nadir and with radiometric and geometric corrections [39]. PROBA-V is the payload of a satellite mission to map land cover and vegetation and was designed to ensure continuity of the Satellite Pour l'Observation de la Terre Vegetation (SPOT-VEGETATION) mission [22,40]. The sensor operates with four bands in the visible and infrared spectra (blue, 0.435-0.490 µm; red, 0.610-0.700 µm; near infrared (NIR), 0.760-0.930 µm; and shortwave infrared (SWIR), 1.520-1.670 µm). The PROBA-V mission also provides the status map (SM) information (pixel radiometric quality data) and the normalized difference data between the NIR and red bands, called the normalized difference vegetation index (NDVI) (not used in this study).
As ancillary data, we selected two sets of Landsat-8 Operational Land Imager (OLI) satellite data obtained in 2015 (bands 4, 5, and 6 in the green, red, and NIR spectral regions, respectively; spatial resolution of 30 m; repeat pass of 16 days). One set was from the crop-growing season (July-December), and the other set was from the harvesting season (January-July). The MODIS-based time series of NDVI signatures available in the Satellite Vegetation (SATVeg) web tool, developed by the Brazilian Agricultural Research Corporation (Embrapa) 1 , were also used in this study. The LULC map of 2015, produced by the MapBiomas project, was the main vector-based data considered in this study.

Data Processing
The main steps of the methodological approach applied in this study and conducted in the Google Earth Engine platform [41] are shown in Figure 2. First, we imported the PROBA-V 100-m scenes from 2015 in the Google Earth Engine platform and clipped to the Mato Grosso state boundary [42]. The following bands were considered: Blue, red, NIR, SWIR, and SM. The annual mosaic of cloud-free images from 2015 was generated by a script for masking pixels whose values stated as cloud or cirrus in the SM band were excluded [40]; thus, only pixels with good radiometric quality, that is, classified in the SM band as 0 (clear), were selected.
Next, we selected homogeneous areas of bare soils, large croplands, and clear water bodies in the PROBA-V annual mosaic to obtain the pure endmembers of soil, vegetation, and shade, respectively. Then, the soil, vegetation, and shade fraction images were obtained by selecting endmembers by visual inspection of the median values of the blue, red, NIR, and SWIR bands. The representative reflectance values of each endmember in the blue, red, NIR, and SWIR spectral regions for Mato Grosso state are shown in Figure 3.
The LSMM assumes that pixel values are a linear combination of reflectance values from a certain number of elements (endmembers). The fraction image is obtained for each endmember representing the proportion of the corresponding endmember within a pixel by minimizing the sum of squares of errors in the original data [27]. The proportion of each element is constrained to the range from 0% (absence) to 100% (maximum presence) at each cell resolution. The LSMM can be represented by Equation (1): where:      The set of PROBA-V images obtained in 2015 was reduced to three maximum and three standard deviation fraction images of vegetation, soil, and shade. This approach ensured that the highest fraction value was selected on the composites within the year [30]. Therefore, the information of one year of PROBA-V images (in theory, 73 cloud-free images) was condensed in six bands. The novel idea is that the highest vegetation fraction value refers to the most likely class, and the standard deviation of its fractions represents the seasonality of that class throughout the year. The standard deviation of the fraction images is calculated as the deviation from the average value of each fraction image in the annual mosaic composites.

LULC Description
The LULC classes considered in this study were annual croplands, forestlands, water bodies, humid savanna, pasturelands, and dry savanna. These classes were discriminated based on the variations in the proportions of the soil, vegetation, and shade endmembers, according to the seasonality, instead of analyzing their spectral behaviors, which invariably change with the sensor and spatial resolution [43]. This means that each class has a typical annual behavior according to its increase or decrease in vegetation, soil, and shade proportions throughout the year [30,31]. For instance, in Mato Grosso, croplands are mostly represented by annual, rain-fed plantations of soybean, maize, and cotton, while pasturelands are mostly represented by planted pastures with Brachiaria species. Therefore, the terrain conditions in croplands over the year are significantly more variable than those from planted pastures, ranging from green vegetation cover to bare soils or straws, which explains the higher seasonality of soil and vegetation fraction images throughout the year [30,31,44]. In other words, croplands have high vegetation cover, low soil cover, and low shade fraction during the peak of the growing season; however, during the harvesting and preseeding phases, croplands have low vegetation, high soil, and low shade fractions ( Figure 4).
The set of PROBA-V images obtained in 2015 was reduced to three maximum and three standard deviation fraction images of vegetation, soil, and shade. This approach ensured that the highest fraction value was selected on the composites within the year [30]. Therefore, the information of one year of PROBA-V images (in theory, 73 cloud-free images) was condensed in six bands. The novel idea is that the highest vegetation fraction value refers to the most likely class, and the standard deviation of its fractions represents the seasonality of that class throughout the year. The standard deviation of the fraction images is calculated as the deviation from the average value of each fraction image in the annual mosaic composites.

LULC Description
The LULC classes considered in this study were annual croplands, forestlands, water bodies, humid savanna, pasturelands, and dry savanna. These classes were discriminated based on the variations in the proportions of the soil, vegetation, and shade endmembers, according to the seasonality, instead of analyzing their spectral behaviors, which invariably change with the sensor and spatial resolution [43]. This means that each class has a typical annual behavior according to its increase or decrease in vegetation, soil, and shade proportions throughout the year [30,31]. For instance, in Mato Grosso, croplands are mostly represented by annual, rain-fed plantations of soybean, maize, and cotton, while pasturelands are mostly represented by planted pastures with Brachiaria species. Therefore, the terrain conditions in croplands over the year are significantly more variable than those from planted pastures, ranging from green vegetation cover to bare soils or straws, which explains the higher seasonality of soil and vegetation fraction images throughout the year [30,31,44]. In other words, croplands have high vegetation cover, low soil cover, and low shade fraction during the peak of the growing season; however, during the harvesting and preseeding phases, croplands have low vegetation, high soil, and low shade fractions ( Figure 4).  Forestlands include the Amazon tropical rainforest located in the northern part of Mato Grosso state, Cerrado woodlands, riparian forests, and forest regenerations with varying ages. They are characterized by deciduous and semideciduous forest with trees between 15 and 20 m height and no leaf losses greater than 50% at the Cerrado biome and by the open forest with trees between 25 and 30 m height in the Amazon forest, with canopy cover greater than 80% at the Amazon biome [45,46]. In addition, forestlands have a medium vegetation fraction, low soil proportion, medium shade fraction, and low standard deviation for all fractions throughout the year because of the weak seasonality, as shown in the boxplots of Figure 4. Water bodies include major rivers and perennial lakes from the wetlands of the Pantanal biome and present the highest shade proportion throughout the year (boxplot of Figure 4C) and almost null proportions of vegetation and soil ( Figure 4A,B, respectively).
Humid savanna corresponds to the natural grass-dominated strata mostly found in the wetlands of the Pantanal biome, highly susceptible to flooding during the wet season (October to March), and occurs mainly in poorly drained soils [31]. Because of the sparse, up to 3-m-high trees covering less than 20% of the surface, these areas can be identified by a slightly higher fraction of shade compared with that of other vegetated classes, as shown in the boxplots of Figure 4C [45]. Dry savanna corresponds to the mixture of grass-, shrub-, and tree-dominated layers in varying proportions that occur mainly in the Cerrado biome [6,45]. Savannas and pasturelands usually present medium soil, vegetation, and shade fractions and moderate seasonality, which results in higher standard deviations than that of forestlands. Their location in Figure 4 is between the cropland and forestland classes. Additionally, a lower soil fraction on managed pastures is expected in relation to natural grasslands, as well as a lower shade fraction in relation to the humid savannas. Dry savannas are characterized by a higher proportion of shrubs and trees in relation to pasturelands and humid savannas, which may be characterized by an intermediate proportion of the soil and vegetation fraction [45].

Sampling Design and Classification
To ensure the representativeness of sampling, a set of 622 samples (348 in the Amazonia biome, 234 in the Cerrado, and 40 in the Pantanal) was randomly distributed over the study area by visual inspection ( Figure 5).
Land 2020, 9, x FOR PEER REVIEW 7 of 20 Forestlands include the Amazon tropical rainforest located in the northern part of Mato Grosso state, Cerrado woodlands, riparian forests, and forest regenerations with varying ages. They are characterized by deciduous and semideciduous forest with trees between 15 and 20 m height and no leaf losses greater than 50% at the Cerrado biome and by the open forest with trees between 25 and 30 m height in the Amazon forest, with canopy cover greater than 80% at the Amazon biome [45,46]. In addition, forestlands have a medium vegetation fraction, low soil proportion, medium shade fraction, and low standard deviation for all fractions throughout the year because of the weak seasonality, as shown in the boxplots of Figure 4. Water bodies include major rivers and perennial lakes from the wetlands of the Pantanal biome and present the highest shade proportion throughout the year (boxplot of Figure 4C) and almost null proportions of vegetation and soil ( Figure 4A and 4B, respectively).
Humid savanna corresponds to the natural grass-dominated strata mostly found in the wetlands of the Pantanal biome, highly susceptible to flooding during the wet season (October to March), and occurs mainly in poorly drained soils [31]. Because of the sparse, up to 3-m-high trees covering less than 20% of the surface, these areas can be identified by a slightly higher fraction of shade compared with that of other vegetated classes, as shown in the boxplots of Figure 4C [45]. Dry savanna corresponds to the mixture of grass-, shrub-, and tree-dominated layers in varying proportions that occur mainly in the Cerrado biome [6,45]. Savannas and pasturelands usually present medium soil, vegetation, and shade fractions and moderate seasonality, which results in higher standard deviations than that of forestlands. Their location in Figure 4 is between the cropland and forestland classes. Additionally, a lower soil fraction on managed pastures is expected in relation to natural grasslands, as well as a lower shade fraction in relation to the humid savannas. Dry savannas are characterized by a higher proportion of shrubs and trees in relation to pasturelands and humid savannas, which may be characterized by an intermediate proportion of the soil and vegetation fraction [45].

Sampling Design and Classification
To ensure the representativeness of sampling, a set of 622 samples (348 in the Amazonia biome, 234 in the Cerrado, and 40 in the Pantanal) was randomly distributed over the study area by visual inspection ( Figure 5).   In total, we selected 151 (84 for validation sampling) samples of croplands; 128 (56) forestland samples; 100 (36, 37, and 36) pastureland, humid, and dry savanna samples; and 43 (14) water body samples. The number of samples for water bodies was low because they occupy a relatively small area in Mato Grosso state. We collected samples only from perennial large rivers, streams, and lakes. For the dry savanna, the small samples were due to confusion with pastures, especially because that class comprises several other classes with a gradient of shrubs and grasslands. Training samples were chosen empirically by observing the sample distribution, accuracy assessment, and classification performance.
On the other hand, the number of validation samples was calculated based on the statistical approach proposed by [47], which consists of generating an adequate probability dataset among stratified random sampling. The sample size for validation was based on the extension area of each stratum and their expected accuracy to ensure sufficient size to produce sufficiently precise estimates. As there was no reference map with the same LULC classes for the study area, a previous classification map was developed using two-thirds of total training samples and one-third for testing ( Figure 5) to obtain the expected overall accuracies by each LULC. Thus, the set parameters were 2% of the standard error of expected overall accuracy; 90% of the expected user's accuracy of the high-confidence accuracies for forestlands, croplands, and water bodies; and 70% of the expected user's accuracy to the low-confidence accuracies for humid savanna, pastureland, and dry savanna. Since some classes are identified more easily than others, this measure will influence the overall size of the samples.
Finally, the mosaics were processed by the random forest classifier in the Google Earth Engine platform. Random forest is a supervised classifier where each tree handles a random selection from the training data by creating a set of a multiple decision trees using bootstrap [48]. The random replacement of trees and samples makes the classifier robust for dealing with a low number of training samples [49]. At each bootstrap replacement, a new tree node is built by gaining information from the comparison of the multiple attributes (spectral bands). This procedure is repeated for the other nodes of the tree, which grow without pruning. Then, k number of trees is generated, forming the (random) forest [48]. The parameter k was set to 30, which is the maximum number of trees to be created.
The accuracy was evaluated by the confusion matrix and by the comparison of results obtained by the MapBiomas project ( Table 1). The same validation dataset was used to assess the agreement with MapBiomas project classes according to [50,51]. As MapBiomas raster-based LULC products are available at 30-m spatial resolution, the data were degraded to 100-m spatial resolution by using the most likely class in the pixel (mode), enabling intercomparison between datasets (https: //code.earthengine.google.com/?accept_repo=users/mapbiomas/user-toolkit). The legend consistency between this study and the MapBiomas project is shown in Table 1.

Maximum and Standard Deviation Fractions of Vegetation, Soil, and Shade
The annual time-series composites of fraction images were reduced by three maximum fractions and by three standard deviation bands. Figure 6 shows the frequency histogram of each of the LULC classes in these six bands. In terms of vegetation fraction, forestland, cropland, and pastureland Land 2020, 9, 139 9 of 20 classes presented relatively high responses due to the high absorption of photosynthetically active radiation by leaves and canopies throughout the year. However, these LULC classes differ by their seasonality, which is more evident in croplands and pasturelands than in forests, as noted by the high standard deviation of the vegetation fraction ( Figure 6D). The forestland defined here is characterized by semideciduous forest and open or dense tropical forest with no pronounced winter, so the proportion of vegetation is quite constant throughout the year, with a medium proportion of shade in the dry season, as 20-50% of the leaves fall from May to September.
classes presented relatively high responses due to the high absorption of photosynthetically active radiation by leaves and canopies throughout the year. However, these LULC classes differ by their seasonality, which is more evident in croplands and pasturelands than in forests, as noted by the high standard deviation of the vegetation fraction ( Figure 6D). The forestland defined here is characterized by semideciduous forest and open or dense tropical forest with no pronounced winter, so the proportion of vegetation is quite constant throughout the year, with a medium proportion of shade in the dry season, as 20-50% of the leaves fall from May to September.
The use of the maximum values of vegetation and soil fractions within a year allows a separation between cropland and forestland or between forestland and pastureland. The soil fraction presented the opposite behavior of maximum vegetation fraction in vegetated classes; the proportion of soil increases during the harvesting and seeding period when the soil becomes exposed, and the vegetation proportion decreases. At the end of the crop cycle (January-April), the maximum vegetation proportion occurs in the cropland class and vice versa in the soil fraction ( Figure 6A and Figure 6B).  The use of the maximum values of vegetation and soil fractions within a year allows a separation between cropland and forestland or between forestland and pastureland. The soil fraction presented the opposite behavior of maximum vegetation fraction in vegetated classes; the proportion of soil increases during the harvesting and seeding period when the soil becomes exposed, and the vegetation proportion decreases. At the end of the crop cycle (January-April), the maximum vegetation proportion occurs in the cropland class and vice versa in the soil fraction ( Figure 6A,B).
Pastureland and savannas presented behavior similar to that of the maximum and standard deviation of the vegetation and soil fractions, however, with lower levels than those observed from the cropland ( Figure 6A,B,D,E). Pastureland and savanna canopies are more heterogeneous than those from cropland, presenting higher levels of mixed vegetation, soil, and shade fractions during the year. Therefore, pastureland can be discriminated from forestland and cropland by the intermediate values of the standard deviation of the vegetation and soil fractions ( Figure 6C,D). Similarly, the maximum fraction of the shade image was helpful to differentiate pastureland from humid savanna and dry savanna because of the stronger seasonality of the pastureland. Pastureland presented higher seasonality of the soil fraction, as depicted in the standard deviation of the soil fraction in Figure 6E, than that from the humid savanna and dry savanna and lower maximum shade fractions. The humid savanna is differentiated by the relatively high amount of shade in the maximum annual composite. As the water bodies absorb most of the incident radiation in the visible and infrared spectra, they are clearly separated by their maximum shade fraction due to the low reflectance of this target; consequently, the water bodies also presented low proportions of vegetation and soil fractions. Figure 7 shows the RGB (Red-Green-Blue) color composite of the maximum fractions of soil, vegetation, and shade fractions in Mato Grosso state. Water bodies (rivers and lakes) and forestland appear blueish and greenish in this false-color composite, respectively. Cropland has a similar proportion of maximum fractions of soil and vegetation throughout the year, so it appears yellowish.
Pastureland and savannas presented behavior similar to that of the maximum and standard deviation of the vegetation and soil fractions, however, with lower levels than those observed from the cropland (Figures 6A, 6B, 6D, and 6E). Pastureland and savanna canopies are more heterogeneous than those from cropland, presenting higher levels of mixed vegetation, soil, and shade fractions during the year. Therefore, pastureland can be discriminated from forestland and cropland by the intermediate values of the standard deviation of the vegetation and soil fractions (Figures 6C and  6D). Similarly, the maximum fraction of the shade image was helpful to differentiate pastureland from humid savanna and dry savanna because of the stronger seasonality of the pastureland. Pastureland presented higher seasonality of the soil fraction, as depicted in the standard deviation of the soil fraction in Figure 6E, than that from the humid savanna and dry savanna and lower maximum shade fractions. The humid savanna is differentiated by the relatively high amount of shade in the maximum annual composite. As the water bodies absorb most of the incident radiation in the visible and infrared spectra, they are clearly separated by their maximum shade fraction due to the low reflectance of this target; consequently, the water bodies also presented low proportions of vegetation and soil fractions. Figure 7 shows the RGB (Red-Green-Blue) color composite of the maximum fractions of soil, vegetation, and shade fractions in Mato Grosso state. Water bodies (rivers and lakes) and forestland appear blueish and greenish in this false-color composite, respectively. Cropland has a similar proportion of maximum fractions of soil and vegetation throughout the year, so it appears yellowish.     Table 2.

LULC Map
pastureland (~228,000 km², 24% of the state), basically found throughout the state. The spatial distribution of forestland and pastureland agrees quite well with the MODIS-based LULC map of Mato Grosso from 2017 [52]. Forestland and pastureland are also the dominant LULC classes in the part of the state covered by the Amazonia biome: 64% and 26%, respectively. Forestland, humid savanna, and pastureland are the three dominant LULC classes in areas occupied by the Cerrado biome: 25%, 27%, and 26%, respectively. In the Pantanal biome, we found 35% forestland and 32% humid savanna, the two most dominant LULC types in this biome.  The dominant LULC class in Mato Grosso is forestland, covering an area of approximately 422,000 km 2 (45% of the state), mostly in the northern and southern parts of the state, followed by pastureland (~228,000 km 2 , 24% of the state), basically found throughout the state. The spatial distribution of forestland and pastureland agrees quite well with the MODIS-based LULC map of Mato Grosso from 2017 [52]. Forestland and pastureland are also the dominant LULC classes in the part of the state covered by the Amazonia biome: 64% and 26%, respectively. Forestland, humid savanna, and pastureland are the three dominant LULC classes in areas occupied by the Cerrado biome: 25%, 27%, and 26%, respectively. In the Pantanal biome, we found 35% forestland and 32% humid savanna, the two most dominant LULC types in this biome.
A large portion of forestland in Mato Grosso is protected by indigenous lands, especially by the Parque do Xingu and the Parque do Aripuanã indigenous lands in the northeastern and northwestern regions of Mato Grosso, as well as the Paresi indigenous land in the western part of the state (Cerrado biome). Surprisingly, there is a lack of permanently protected conservation units in Mato Grosso, which is commonly found in other states in the Amazonia biome. The cropland occurs mostly in the central part of the state, adjacent to the cities of Nova Mutum, Lucas do Rio Verde, Sorriso, and Sinop, in the Cerrado/Amazonia ecotone region, and along the BR-163 highway (Cuiabá-Santarém highway, not shown in Figure 8), where most of the cropland and forest losses due to clear cutting and selective logging are found [53,54]. Primavera do Leste is the major grain-producing region in the state, located in the Cerrado biome. Juína, Colniza, and Alta Floresta, mostly occupied by pasturelands, are important in terms of Amazon biodiversity conservation. These are priority regions of the Brazilian Institute of Environment and Renewable Natural Resources (IBAMA) to combat and control illegal deforestation in the BLA since they are located near the Amazonas state, where we find large fragments of pristine rainfall forest. In the region of São Felix do Araguaia, we find the largest areas of humid savannas in Mato Grosso, located over the alluvial sediments of the Araguaia River basin [55,56]. In Brazil, the Araguaia River Basin and the Pantanal are the two largest wetlands influenced by tropical, seasonal floods.

Classification Performance and Uncertainties
The proposed approach demonstrated that the highest value of the fraction images on the annual composite allowed the discrimination of LULC classes based on their highest peaks during the year, while standard deviation emphasized the seasonality of these classes, improving the separability of LULC classes with similar spectral responses. However, some misclassifications remained, for example, the confusion between pasturelands and croplands that has already been reported by several authors [5,6,17]. The managed pastures promote vegetation growth similar to that of the croplands. Such homogeneous vegetation covering confuses the classifier, hence reducing the accuracy of this class [5,6]. For other natural classes from the Cerrado biome, such as the dry and humid savannas reported here, the misinterpretation may be even worse. According to the Cerrado vegetation classification system proposed by [45], there are at least seven different phytophysiognomies in the Cerrado biome, which differ in terms of vegetation cover, the amount of biomass, and proportion of trees, shrubs, and grasses. In addition, the changes in LULC classes during the observed year can produce error in the classification, confusing the classifier when assigning the last change as a correct class. For instance, the conversion of forest in pasture could classify this class as forest due to the highest vegetation fraction in that year.
Compared with the validation sample design applied in the MapBiomas project, we noticed that cropland, forestland, water bodies, and pastureland presented similar user and producer accuracies (92-100%) ( Table 4). However, dry and humid savannas presented high levels of mismatches (53-64%), probably due to the legend definitions. The overall accuracy classification performance for Mato Grosso state was similar to that reported by the MapBiomas project, which was 90% in 2015 [16]. The overall accuracy in terms of biome was 95% for the Amazonia, 83% for the Cerrado, and 81% for the Pantanal, demonstrating again the greater difficulty in mapping savanna formations (https://mapbiomas.org/en/estatistica-de-acuracia?cama_set_language=en). The total area classification performance agreed with that of the previous study [16] for most classes ( Table 5). One of the main differences between those mappings was that the extent of cropland estimates was 32%. Here, it is important to note that the MapBiomas maps do not differentiate between annual and perennial crops. Perennial crops are classified as forestlands in our classification. Our approach did not discriminate among perennial and semiperennial crops and annual crops because of the low seasonality of the perennial and semiperennial crops. If the semiperennial crops (sugarcane, for example) were excluded from the area comparison based on the MapBiomas project, the difference would be 10% higher. Furthermore, both initiatives have the limitation of not classifying more than one annual harvest cycle (typically, double cropping), a common practice in Mato Grosso state [34,57]. Other studies showed larger cropland estimates than those in our method: 98,488 km 2 [58] and 98,723 km 2 [59], in which double-cropped areas were counted twice [59]. To overcome this limitation, one possibility is the use of a crop calendar (wet-dry seasons), as proposed by [60], including the short Land 2020, 9, 139 14 of 20 and long crop cycles in the estimates, or using a metric that clearly identifies two cycles in the same time series [19,52,57]. The forestland showed a 3% difference between the two studies. In the MapBiomas project, the tree-dominated savannas, tree-dominated wetlands, and the Amazon rainforest were grouped in the forestland class. The area estimation of pastureland differed by only 1% in relation to the MapBiomas project, even though pasturelands are difficult to map because of the similar spectral response with the savanna grasslands and other land covers [5]. Farmers often use natural fields for cattle ranching, making it difficult to differentiate them from native savanna grassland [61]. In that manner, pastures were confused with dry savanna ( Table 1). The commission error was better than the omission error, leading to a most likely overestimation of the area of pastures. Pasture conversion represented an average of 79% of new cropland area in Brazil and 20% of natural vegetation conversion during the 2000-2014 period, which represents approximately 8% of pasture conversion per year [59].
The greatest difference in area estimation was found in the water body class, which was 4.1 times lower than the MapBiomas project estimation. The main reason for this difference was probably related to the lower spatial resolution of the PROBA-V (100 m) in comparison to that of the Landsat collection (30 m) classification of the MapBiomas. Another possible reason is the underestimation in this study of the Pantanal biome because of the use of the maximum shade fraction in the annual composite, different from that of the other initiatives [e.g., 16], which perform classification using wet season datasets when there is a lack of data. Another Brazilian initiative, called TerraClass, computed 4747 km 2 of water bodies in Mato Grosso state in 2014 using Landsat mosaic composites and visual interpretation [62]. In this sense, the proposed approach minimizes the influence of the seasonality of the native vegetation of the savanna and wetlands by not determining single dry or wet periods, which, in turn, vary throughout Mato Grosso state [63] and may affect the area estimation in LULC mappings.

Novel Approach
Although the results have shown inconsistencies in the classification of certain classes, uncertainties may be associated with the heterogeneity of the landscape within the 100 m × 100 m cell of the sensor used, making it difficult to map small environmental changes that may not be detected due to the spatial resolution. On the other hand, the better temporal resolution increased the probability of finding pixels without the influence of clouds, making it possible to extract values at least once a year, which might not occur in sensors with lower temporal resolution.
Thus, from the point of view of quickly assessing the main LULC classes with the low computational cost of cloud computing and human resources, the proposed approach has the advantage of presenting a higher availability of cloud-free pixels. For example, Landsat-like sensors can have a maximum of 23 cloud-free pixels in a year and up to 60 in the Sentinel-2A and Sentinel-2B satellites (only Sentinel-2A scenes are available in Brazil, so far), while PROBA-V can obtain up to 73 synthetic 5-day composites [64]. In fact, these are hypothetical values that depend on the frequency of cloud cover, especially during the rainy season (October-April), which reduces data availability. Figure 9 shows the frequency of the cloud-free and cloud shadow-free pixels over Mato Grosso state in 2015 and their comparison with the moderate spatial resolution. The average number of clear observations was 29 pixels without cloud and cloud shadows over the year ( Figure 9B). In addition, there was a clear gradient of the spatial distribution of data availability from east to west ( Figure 9A). In the east, a drier region with a predominance of Cerrado physiognomies, we found clearer observations than those in the west, which is characterized by dense Amazon forest ( Figure 9A). Data availability is a limiting factor in reconstructing the phenological behavior of the main LULC classes in a time series classification [57]. Nevertheless, PROBA-V imagery has an excellent trade-off between data availability and spatial resolution, with a minimum of 20 "clear" pixels per year ( Figure 9A). can have a maximum of 23 cloud-free pixels in a year and up to 60 in the Sentinel-2A and Sentinel-2B satellites (only Sentinel-2A scenes are available in Brazil, so far), while PROBA-V can obtain up to 73 synthetic 5-day composites [64]. In fact, these are hypothetical values that depend on the frequency of cloud cover, especially during the rainy season (October-April), which reduces data availability. Figure 9 shows the frequency of the cloud-free and cloud shadow-free pixels over Mato Grosso state in 2015 and their comparison with the moderate spatial resolution. The average number of clear observations was 29 pixels without cloud and cloud shadows over the year ( Figure 9B). In addition, there was a clear gradient of the spatial distribution of data availability from east to west ( Figure 9A). In the east, a drier region with a predominance of Cerrado physiognomies, we found clearer observations than those in the west, which is characterized by dense Amazon forest ( Figure 9A). Data availability is a limiting factor in reconstructing the phenological behavior of the main LULC classes in a time series classification [57]. Nevertheless, PROBA-V imagery has an excellent trade-off between data availability and spatial resolution, with a minimum of 20 "clear" pixels per year ( Figure 9A).

Perspective
The method presented encouraging results for Mato Grosso state, but it may face some difficulty in other regions or in discriminating more detailed LULC classes. For example, the Cerrado biome is characterized by landscape patterns varying from open grassland to dense canopies; therefore, obtaining accurate LULC classification in this biome may be slightly challenging [65]. In this study, we grouped Cerrado physiognomies into dry savanna and humid savanna to produce LULC classes with good performance. However, if there is a necessity to discriminate among the seven types of physiognomies (two grassland classes, three shrubland classes, and two forestland classes) according to the Cerrado vegetation classification system proposed by [45] or to precisely discriminate between natural grasslands and planted pasturelands, the accuracy of the method may decrease substantially because of the similarities in the spectral responses.
The use of spectral mixture models has been reported to estimate woody and herbaceous cover using the photosynthetic vegetation (PV) fraction in addition to the nonphotosynthetic vegetation (NPV) fraction [66,67], which can include materials such as dry organic matter, litter, wood, and dry logs. According to [67], using only the vegetation fraction can overestimate seasonal vegetation, and the use of NPV can be useful to analyze the dynamics, function, and structure of vegetation. The use of four components (PV, NPV, shade, and soil) to generate land cover classifications in forested areas with high biomass, such as the Amazon rainforest, has been considered [68]. Recent research has focused on collecting photosynthetic and nonphotosynthetic endmembers of vegetation to enable the discrimination of dry and wet vegetation in environments with strong seasonality during the year [69]. However, few studies have investigated the applicability of these four components to quantify and monitor changes in seasonal vegetation and with low levels of biomass, as is the case for the Cerrado biome.
Currently, there are some innovative methods and image synergisms that have been tested to improve the discrimination of land cover classes. The use of L-band synthetic aperture radar (SAR) images provided a significant improvement in the discrimination of woody canopy cover from African dry savanna [70]. This radar-based approach may improve the classification of humid savannas since SAR data are sensitive to moist vegetation. The use of the tasseled cap transformation of the Landsat and Sentinel-2 time series to discriminate the Cerrado physiognomies presented an overall accuracy of 63% [43]. Previous studies also showed that Bayesian unsupervised classification and multiple platforms/sensors improved LULC discrimination [71]. Harmonic models applied to time series have been used to classify forest types [72], facilitating the overview of phenological cycles that could be used to improve interclass separation.
For future investigations, we suggest evaluating the feasibility of combining the proposed approach with other methods that rely on multiple sensors and different image processing techniques to derive accurate LULC products in complex ecosystems.

Conclusions
In this study, a novel method to produce LULC maps based on the classification of the three maximum fraction images and three standard deviations derived from annual composites of the 100-m spatial resolution PROBA-V dataset was presented. The innovative idea involves the analysis of the temporal (annual) behavior of the proportions of vegetation, soil, and shade of the main LULC classes to classify them according to their maximum peaks and standard deviations (seasonality), which are closely related to the biophysical and phenological behaviors of those classes.
This study presented an overall accuracy classification of 92.4% and an agreement of 88% with the LULC map produced by the MapBiomas project. The user accuracy and the producer accuracy of classifying croplands, forestlands, and water bodies were above 90%. The method is suitable for processing in cloud-computing platforms, such as the Google Earth Engine, which helps address the high dimensionality of the data; therefore, it can be easily tested for regional-and global-scale classification purposes.
One of the main limitations of the method is the low number of observations (pixels) without cloud and cloud shadows in some regions over the year to attribute the maximum and standard deviation into the correct LULC class. Data fusion or data synergism may be a proper initiative to address the lack of temporal information or the mixture complexity of the annual proportion of endmembers in the Cerrado physiognomies or even the spectral differences related to either well-managed or poorly managed pastures.