Mapping Fractional Cropland Distribution in Mato Grosso, Brazil Using Time Series Modis Enhanced Vegetation Index and Landsat Thematic Mapper Data

Mapping cropland distribution over large areas has attracted great attention in recent years, however, traditional pixel-based classification approaches produce high uncertainty in cropland area statistics. This study proposes a new approach to map fractional cropland distribution in Mato Grosso, Brazil using time series MODIS enhanced vegetation index (EVI) and Landsat Thematic Mapper (TM) data. The major steps include: (1) remove noise and clouds/shadows contamination using the Savizky–Gloay filter and temporal resampling algorithm based on the time series MODIS EVI data; (2) identify the best periods to extract croplands through crop phenology analysis; (3) develop a seasonal dynamic index (SDI) from the time series MODIS EVI data based on three key stages: sowing, growing, and harvest; and (4) develop a regression model to estimate cropland fraction based on the relationship between SDI and Landsat-derived fractional cropland data. The root mean squared error of 0.14 was obtained based on the analysis of randomly selected 500 sample plots. This research shows that the proposed approach is promising for rapidly mapping fractional cropland distribution in Mato Grosso, Brazil.


Introduction
Global population increase in addition to frequent extreme weather events (e.g., drought, flooding) require accurately updating cropland distribution and its dynamic change in a large area [1][2][3][4].Remote sensing has become a primary data source for mapping cropland distribution over large areas [5][6][7][8][9].Due to its long term history of data availability at no cost, Landsat imagery has been applied extensively for land use and land cover change detection [10][11][12], including cropland dynamic change [10].However, due to relatively infrequent re-visit times (e.g., 16 days re-visit for Landsat Thematic Mapper (TM)), obtaining cloud-free images may be difficult, especially in tropical and subtropical regions [13].Because of the time-consuming and extensive labor in analyzing high or medium spatial resolution images (e.g., Landsat) of a large area and the difficulty in acquiring cloud-free optical sensor images in the moist tropical regions, much research in the past decade has shifted to the use of

Study Area and Materials
The study area is located in the central and western part of Mato Grosso State, Brazil (Figure 1), featured by a transition from savanna in the south to tropical rainforests in the north.Soy is the main crop in this area and the sowing calendar for soybeans goes from mid-September to late December, depending on agricultural zoning for different soils, regions, and the onset of the rainy season [6,24].According to IBGE 2015 (The Brazilian Institute of Geography and Statistics), area planted in soybeans increased by 5.59 million ha from 1995 (2.34 million ha) to 2013 (7.93 million ha).In Mato Grosso State, six cropping types (soy-corn, soy-cotton, soy-millet, soy-soy, cotton and pasture) account for 91.5% of reported agricultural land area [24].These planting structures include single and double cropping systems.

Methods
Figure 2 illustrates the framework of mapping fractional cropland distribution using the MODIS EVI data and the two Landsat TM images.The major steps include: (1) extract cropland from Landsat TM imagery using unsupervised classification, aggregate the cropland image from 30 m cell size to 250 m cell size for producing fractional cropland distribution using the mean algorithm, and select sample plots using random sampling technique.Half of the selected samples were used for development of regression model and the other half were used to evaluate the estimates; (2) identify three key stages-sowing, growing, and harvest using the crop phenology analysis of time series MODIS EVI data and then calculate SDI; (3) develop a regression model for estimating fractional cropland by relating SDI and TM-derived fraction cropland data; and (4) evaluate the fractional cropland estimates using the SDI-based approach.The first step-extraction of cropland reference data from Landsat imagery was briefly described in Section 2. The following subsections describe the remaining steps.Framework of fractional cropland mapping approach using the integration of time series MODIS (Moderate-resolution Imaging Spectroradiometer) Enhanced Vegetation Index (EVI) and Landsat Thematic Mapper (TM) data.

EVI Profiles and Crop Phenology Analysis
The seasonal patterns of croplands provide the foundation for cropland extraction using time series remote sensing imagery.Crop phenology is primary information for cropland mapping based on the spectral-temporal crop growth profile analysis.Crops have their own characteristics in the stages of planting, growing and maturing compared to other vegetation types such as forests.In Figure 3 we illustrate the EVI time series profiles for different land cover types based on the cropping system in Mato Grosso State.Through a year the EVI values in croplands vary drastically in different stages from sowing and growing to harvest.In contrast, forest has constant EVI profiles throughout the year.Crop phenology analysis is used to determine the key identification stages (KIS): sowing (Stage 1-from dry to wet season transition), growing (Stage 2), and harvest (Stage 3) seasons.Figure 3 implies that croplands can be identified from the three stages, instead of time series data of an entire year.This is critical for mapping cropland distribution in tropical and subtropical regions because of the cloud cover problem resulting in the difficulty in acquisition of cloud-free imagery.

Time Sparse Resampling (TSR)
Cloud contamination and noise are still common in the MODIS vegetation index composites.The Savitzky-Golay (S-G) filter is often used to reduce this problem [40][41][42][43].Based on the regional crop phenology, a time sparse resampling (TSR) algorithm was proposed at the basis of the S-G filter (See Figure 4).The TSR algorithm was used to produce a cloud-free composite at different crop growth stages and to reduce the effect of the large variation of EVI profiles within each season.The time series EVI profile contains a large variation in croplands within a single year.The TSR used several images from the time series to produce a stable indicative factor within each season.For example, in Mato Grosso, Brazil, the croplands in the sowing season (from September to October) have very limited vegetation cover, a minimum EVI value is used in this period.The crops in growing season (November, December, and January) have the highest biomass, thus a maximum EVI value is selected.In the first harvest season (from January to March), there is a valley in annual crop growth profile, thus, the TSR algorithm is adjusted to acquire the minimum EVI value in the harvest season.The TSR is used to generate a new EVI composite in KIS as expressed in Equations ( 1

Seasonal Dynamic Index (SDI)
According to the EVI fluctuations at the different crop stages, the variation of EVI in a pixel can be assumed to have a positive correlation with the proportion of cropland area.Based on this hypothesis, the seasonal dynamic index (SDI) model is proposed through the EVI composites at different stages, as shown in Figure 4 for a typical crop phenology curve.Thus, SDI can be expressed as follows: Mask = Pas mask × Slp mask (7) where SDI is seasonal dynamic index and SDI 1 and SDI 2 correspond to the seasonal dynamic index at different crop cycles.Considering the positive values of SDI 1 and SDI 2 in the double cropping system and negative values in the single cropping system, abs (absolute value) is used.To be classified as cropland, in addition to slope restrains, the pixel should meet three conditions (see Figure 4): EVI has (1) low value in the sowing period (Stage 1); (2) high value in the growing period (Stage 2); and (3) low value in the harvest period (Stage 3).In order to improve fractional cropland estimation accuracy, Mask is used as an indicator for calculating cropland fraction, which includes terrain (slope) mask and grassland (or savanna) mask.Here the slope mask and grassland mask are defined as follows: (1) Slp mask is the topographic factor mask which the slope is derived from SRTM data.A threshold of 12% is used for Slp mask , that is, the pixels having slopes greater than 12% are excluded from calculation, because a relatively flat landscape is required for allowing the use of farm

Seasonal Dynamic Index (SDI)
According to the EVI fluctuations at the different crop stages, the variation of EVI in a pixel can be assumed to have a positive correlation with the proportion of cropland area.Based on this hypothesis, the seasonal dynamic index (SDI) model is proposed through the EVI composites at different stages, as shown in Figure 4 for a typical crop phenology curve.Thus, SDI can be expressed as follows: SDI " MAX pSDI 1 , SDI 2 q ˆMask (4) Mask " Pas mask ˆSlp mask (7) where SDI is seasonal dynamic index and SDI 1 and SDI 2 correspond to the seasonal dynamic index at different crop cycles.Considering the positive values of SDI 1 and SDI 2 in the double cropping system and negative values in the single cropping system, abs (absolute value) is used.To be classified as cropland, in addition to slope restrains, the pixel should meet three conditions (see Figure 4): EVI has (1) low value in the sowing period (Stage 1); (2) high value in the growing period (Stage 2); and (3) low value in the harvest period (Stage 3).In order to improve fractional cropland estimation accuracy, Mask is used as an indicator for calculating cropland fraction, which includes terrain (slope) mask and grassland (or savanna) mask.Here the slope mask and grassland mask are defined as follows: (1) Slp mask is the topographic factor mask which the slope is derived from SRTM data.A threshold of 12% is used for Slp mask , that is, the pixels having slopes greater than 12% are excluded from calculation, because a relatively flat landscape is required for allowing the use of farm machinery [21].Therefore, Slp mask is assigned to 1 when slope is less than 12%; otherwise, Slp mask is assigned to 0.
(2) Pas mask is a pasture mask.For Pas mask , the ratio of SDI 1 and SDI 2 is used to discriminate grassland and cropland.SDI 1 and SDI 2 have high value in double cropping, but SDI 1 has low value and SDI 2 has high value in single cropping.In contrast, grass land has a high value in SDI 1 and a low value in SDI 2 .Based on trial and error testing a threshold of 2.5 is used.That is, if the ratio of SDI 1 and SDI 2 is greater than 2.5, the pixel is assigned to 0; otherwise, the pixel is assigned to 1.

Analysis of the Relationship between SDI and Fraction Croplands
The Landsat-derived cropland imagery was aggregated to generate fractional cropland imagery with the same cell size as the SDI imagery.A total of 1000 sample plots were randomly selected from the overlapping regions between the fractional cropland imagery and the SDI imagery to extract the fraction value and corresponding SDI value at the same locations.The box whisker plot approach was used to examine SDI features, representing the graphical distribution of fractional cropland in a sample plot against the pixel SDI value (Figure 5).The fractional values ranging from 0% to 100% were separated into 10 groups with an interval of 10%.The SDI values range from 0 to 1 depending on the proportion of cropland cover in a MODIS pixel.In general, higher proportions of cropland in a sample plot correspond to a higher SDI values, as shown in Figure 5.This implies that the fraction of cropland in sample plots has a linear relationship with SDI values.This provides the foundation for developing a linear regression model for estimating fractional cropland values using the SDI variable.
(2) Pas mask is a pasture mask.For Pas mask , the ratio of SDI 1 and SDI 2 is used to discriminate grassland and cropland.SDI 1 and SDI 2 have high value in double cropping, but SDI 1 has low value and SDI 2 has high value in single cropping.In contrast, grass land has a high value in SDI 1 and a low value in SDI 2 .Based on trial and error testing a threshold of 2.5 is used.That is, if the ratio of SDI 1 and SDI 2 is greater than 2.5, the pixel is assigned to 0; otherwise, the pixel is assigned to 1.

Analysis of the Relationship between SDI and Fraction Croplands
The Landsat-derived cropland imagery was aggregated to generate fractional cropland imagery with the same cell size as the SDI imagery.A total of 1000 sample plots were randomly selected from the overlapping regions between the fractional cropland imagery and the SDI imagery to extract the fraction value and corresponding SDI value at the same locations.The box whisker plot approach was used to examine SDI features, representing the graphical distribution of fractional cropland in a sample plot against the pixel SDI value (Figure 5).The fractional values ranging from 0% to 100% were separated into 10 groups with an interval of 10%.The SDI values range from 0 to 1 depending on the proportion of cropland cover in a MODIS pixel.In general, higher proportions of cropland in a sample plot correspond to a higher SDI values, as shown in Figure 5.This implies that the fraction of cropland in sample plots has a linear relationship with SDI values.This provides the foundation for developing a linear regression model for estimating fractional cropland values using the SDI variable.

Mapping of Fractional Cropland Distribution and Evaluation of the Estimates
Based on the randomly selected 500 test samples, a regression model was developed to estimate fractional cropland areas for the whole study area, in which the fractional cropland data from TM data were used as a dependent variable, and SDI was used as an independent variable.The coefficient of determination (R 2 ) was used to evaluate the fitness of the regression model because it measures the percentage of variation explained by the regression model.

Mapping of Fractional Cropland Distribution and Evaluation of the Estimates
Based on the randomly selected 500 test samples, a regression model was developed to estimate fractional cropland areas for the whole study area, in which the fractional cropland data from TM data were used as a dependent variable, and SDI was used as an independent variable.The coefficient of determination (R 2 ) was used to evaluate the fitness of the regression model because it measures the percentage of variation explained by the regression model.
Since the cropland estimation for each pixel is a fraction value, the traditional pixel-based classification assessment approach [43,44] cannot be directly used for the evaluation of this estimates.Therefore, root-mean squared error (RMSE) and the scatterplot between reference data and estimates were used to evaluate the fractional cropland estimation results.

Application of the Proposed Approach to Other Dates for Mapping Fractional Cropland Distribution
The above-mentioned approach is based on the 2011 MODIS EVI data and two Landsat TM images in Mato Grosso.In order to explore the transferability of this proposed approach, we applied this approach to other years of MODIS EVI data in the same study area for estimating fractional cropland distribution.The time series EVI images from 2001 to 2011 at two-year intervals were collected.The SDI image for each selected year was developed using the methods discussed above.The same regression model which was developed from the relationship between the 2011 SDI and corresponding fractional cropland data from Landsat TM imagery was then applied to each SDI image.The spatial patterns of the fractional cropland distribution were visually compared, especially for four typical sites within the study area.

Results
A regression model, that is, f crop =1.1959 ˆSDI ´0.03, was developed and it was used to map fractional cropland distribution for entire study area (see Figure 6).In this factional map, the croplands are concentrated in southern and eastern parts with high fraction values, but are scattered in northern and western parts with fractional values of less than 50% in a pixel.In order to examine the impacts of different cropland patch sizes on fractional cropland mapping performance, Figure 7 provides a comparison of cropland distributions between two sites: Site 1 (Figure 7a-c) representing large patch sizes of croplands, and site 2 (Figure 7d-f) representing the complex land cover composition with relatively small patch sizes of croplands.The results indicate that the cropland estimates with large patch sizes have similar spatial patterns with the cropland distributions from Landsat data (Figure 7a-c), but the croplands with relatively small patch sizes have different spatial patterns (Figure 7d-f) because the complex composition of different land covers in a pixel results in relatively poor estimates.This implies that the complex landscape of the study area may be a major factor resulting in poor performance of cropland estimation.The scatterplot between the estimates and reference data based on the 2011 test samples are illustrated in Figure 8.A RMSE of 0.14 and the correlation coefficient of 0.89 were obtained, indicating a good overall estimation performance.The major errors are from two aspects, the pixels having fraction values of less than 0.2 or more than 0.8.This implies that the SDI-based model cannot accurately estimate the fraction values when the cropland proportion is greater than 80% or smaller than 20% in a MODIS pixel.

Discussion
Continuous time series NDVI or EVI data can provide detailed vegetation phenology information, thus they are often used for mapping cropland distribution [14,20,21,28].However, in tropical and subtropical regions, it is impossible to collect continuous time series optical sensor data (e.g., MODIS) due to the cloud cover problem.Therefore, we proposed the TSR algorithm in this research to solve the data collection problem.The TSR can enhance the differences between cropland and other land cover types, and so aid in the separation of cropland from other land cover.Meanwhile, the use of SDI can help explore the impacts of single or double cropping systems on cropland mapping.The linear relationship between SDI and fractional cropland values in a pixel further documents the potential to estimate fractional cropland in a large area using the SDI image.
This research indicated that errors occurred more frequently when the proportion of croplands in a pixel is very high or very low.When pixels have high proportion of croplands such as greater than 0.8, SDI is often saturated; however, when the pixels have low proportion of croplands such as less than 0.2, SDI is not sensitive enough, resulting in underestimation of cropland area.This situation is similar to previous studies when the cell size of the remotely sensed data is larger than the size of individual objects, this is especially true in urban landscapes, and referred to as mixed pixels [45].The results illustrated in Figure 8 also confirm that the small patch size of croplands, i.e., mixed pixels, is an important source of cropland mapping uncertainty.
Many previous "hard" classification methods such as thresholding approaches or classification algorithms are used to map cropland distribution [6,20,21,27], but the mixed pixel problem inherent in the coarse spatial resolution imagery (e.g., MODIS) often results in large uncertainty in cropland area statistics, resulting in poor area estimation and inaccurate spatial patterns.Although spectral mixture analysis presents an effective way to decompose the spectral reflectance of a pixel into different proportions, and has proven valuable in medium spatial resolution images such as Landsat or hyperspectral data [12,46,47], this approach is not so promising in coarse spatial resolution images such as MODIS due to the complex land cover composition and the difficulty in identifying suitable endmembers in large areas.This research using Landsat-derived cropland data as reference to establish a regression model by relating it to the SDI variable has proven feasible in estimating fractional cropland.
In addition to the spectral mixture analysis in reducing the mixed pixel problem, another approach is the use of data fusion of multi-resolution/sensor data [48][49][50][51].However, in a large area, the data fusion of Landsat TM and MODIS data may not be cost effective or may be not necessary because Landsat TM image can reliably provide cropland classification.In addition, in tropical regions like this study area, Landsat images are not available for many sites due to the cloud cover problem, thus this kind data fusion is not feasible.Therefore, this research proposed the combination of MODIS and a limited number of Landsat TM image to solve this data availability problem.A similar work has been used for mapping fractional forest distribution [45].A combination of multi-source data such as MODIS, Landsat images, and ancillary data is an alternative to improve cropland mapping performance [25].However, attention should be paid to potential geometric errors between different data sources.For example, the misregistration between Landsat and MODIS can reach a minimum of 50 m (nadir) [52].
The application of using this approach to other years of MODIS data in the same study area has indicated its value in rapidly mapping fractional cropland distribution in a large area.However, caution should be taken when attempting to transfer this approach to other study areas due to the different land cover composition.For future study, subpixel based approaches using nonparametric algorithms may be another direction for better extracting croplands, especially when accurate cropland area statistics are required [6,31,35].

Conclusions
The proposed SDI-based approach provided a feasible way for mapping fractional cropland distribution in Mato Grosso, Brazil.In summary, the following conclusions can be obtained: (1) The use of the TSR algorithm based on crop phenology analysis effectively solved the difficulty in collecting continuous time series data because of cloud contamination and noise.(2) The proposed SDI approach effectively extracted fractional cropland distribution in a large area.(3) The use of masks based on slope and pasture further reduced the confusion between croplands and pastures, and thus improved cropland mapping performance.(4) A RMSE of 0.14 was obtained for the cropland distribution in 2011.

Figure 2 .
Figure 2. Framework of fractional cropland mapping approach using the integration of time series MODIS (Moderate-resolution Imaging Spectroradiometer) Enhanced Vegetation Index (EVI) and Landsat Thematic Mapper (TM) data.

Figure 3 .
Figure 3. Determination of key identification stages based on crop phenology analysis.

Figure 4 .
Figure 4.The Enhanced Vegetation Index (EVI) features at different crop growth stages and the seasonal dynamic index.

Figure 4 .
Figure 4.The Enhanced Vegetation Index (EVI) features at different crop growth stages and the seasonal dynamic index.

Figure 5 .
Figure 5.The relationship between seasonal dynamic index (SDI) and fractional croplands.

Figure 5 .
Figure 5.The relationship between seasonal dynamic index (SDI) and fractional croplands.

Figure 6 .
Figure 6.Fractional cropland distribution which is developed using the regression model based on MODIS EVI data in Mato Grosso, Brazil.

Figure 7 .
Figure 7.A comparison of fractional cropland distributions between Landsat-derived reference data and the SDI-based estimates (note: (a-c) represent Landsat TM color composite (7/4/2 bands), fractional cropland distribution from Landsat imagery, and fractional cropland distribution from MODIS EVI data at typical site 1, respectively; (d-f) represent Landsat TM 7/4/2 band color composite, fractional cropland distribution from Landsat imagery, and fractional cropland distribution from MODIS EVI data at typical site 2, respectively).

Figure 8 .
Figure 8.The relationship between estimates from SDI and corresponding cropland fraction reference data from Landsat TM data in 2011.

Figure 9 .
Figure 9.A comparison of the developed fractional cropland distributions between 2001 and 2011 at a two-year interval, which is developed using the SDI-based approach (a-f represent the fractional cropland distribution in 2001-2011).