A Light-Weight Cropland Mapping Model Using Satellite Imagery

Many applications in agriculture as well as other related fields including natural resources, environment, health, and sustainability, depend on recent and reliable cropland maps. Cropland extent and intensity plays a critical input variable for the study of crop production and food security around the world. However, generating such variables manually is difficult, expensive, and time consuming. In this work, we discuss a cost effective, fast, and simple machine-learning-based approach to provide reliable cropland mapping model using satellite imagery. The study includes four test regions, namely Iran, Mozambique, Sri-Lanka, and Sudan, where Sentinel-2 satellite imagery were obtained with assigned NDVI scores. The solution presented in this paper discusses a complete pipeline including data collection, time series reconstruction, and cropland extent and crop intensity mapping using machine learning models. The approach proposed managed to achieve high accuracy results ranging between 0.92 and 0.98 across the four test regions at hand.


Introduction
Croplands are important not only for global food security but also for water security in a world where two-thirds of its population are estimated to experience severe water scarcity at least once a month [1,2]. According to the United Nations' Food and Agriculture Organization (FAO) cropland is "land used for the cultivation of crops, both temporary (annuals) and permanent (perennials) and may include areas periodically left fallow or used as temporary pasture" [3]. Cropland mapping and global agriculture monitoring are therefore very essential for food security, agricultural resources management, agricultural economics, vegetation studies, and land use/land cover mapping purposes [4][5][6]. Croplands mapping and monitoring is a widely used application of remote sensing given its evident advantages of spatial coverage, cost effectiveness, monitoring ability for its revisiting frequency, and its ability to detect crops stress [7][8][9][10][11][12][13]. Crop intensity mapping, on the other hand, provides important information on the changes in and productivity of agricultural land [14]. Specifically, crop intensity mapping refers to the segmentation of agricultural land according to the number of crop planting cycles it exhibits, with cycle numbers ranging from 0 to 3, where 0 refers to no crops occurring, and 3 refers to the case in which the same land experienced 3 complete growth cycles of its crops in one year [15], [16]. Such a task plays an important role in improving food production, and agricultural planning, through surveying cropland changes [14,17].
Among vegetation indices, Normalized Difference Vegetation Index (NDVI) data are very popular remote sensing-based information for cropland coverage mapping and crops conditions assessment. NDVI is widely used to estimate the extent and vigor of vegetation on earth. NDVI is advantageous in defining vegetated areas, determining the biophysical conditions, including most vegetation phenological and physiological states, and in distinguishing between vegetation and bare soil. On the other hand, NDVIs major limitations are evident in very poorly or very highly vegetated areas. For instance, in areas with bare soil and very low vegetation cover, NDVI tends to be more sensitive to soil while in highly vegetated areas, NDVI tends to saturate. NDVI data have proven its robustness for cropland extent mapping [18], classification of agricultural lands [19], land suitability analysis for crop cultivation [20], crop type mapping [21], assessment of leaf area index (LAI), and vegetation fraction identification [22], to mention a few.
NDVI is typically estimated using the percentage surface reflectance values of the red and near-infrared (NIR) bands acquired through remote sensing. NDVI is a unitless index that ranges from −1 to +1, with values close to −1 indicating areas with no vegetation or water, values close to 0 indicating areas with sparse vegetation or bare soil, and values close to +1 indicating areas with dense and healthy vegetation.
The availability of remote sensing data from different satellite programs and the ease of access to such data have led to the availability of time series NDVI datasets on global, regional, and local scales. Some of the satellite programs that make data freely available to the end users include MODIS, Landsat, and Sentinel, along with limited commercial satellite data with higher spatial and temporal resolutions from programs such as WorldView-3,-4, and Planet [23].
The NDVI time series data, however, typically contain some errors due to atmosphere conditions, cloud cover, and sensor capacity. It is therefore very important to analyze, smooth, and reconstruct NDVI data before its use in cropland mapping [24][25][26]. Matsushita et al. [27] and Kumari et al. [28] have discussed the effect of topographic illumination, shading effects, and solar angle issues on NDVI. Kumari et al. [28] has provided a better understanding of the variations of vegetation on hillslopes facing opposite directions. The study has shown that variations of solar radiation on polar-facing slopes would lead to higher values of NDVI compared to equatorial-facing slopes. Further, a discussion by Kumari et al. [28] has highlighted the advantage of the band rationing of NDVI in eliminating most of the effect topographic illumination, shading effects, and solar angle issues [28,29]. The common NDVI data reconstruction techniques for local and regional studies are spatial and based on proximity analysis. These methods consider the correlation between pixels and their neighborhood to restore the missing values. Some of the common spatial methods are based on linear, bilinear, and kriging interpolation. Some of the popular methods for reconstructing NDVI time series data are temporal-or spatiotemporal-based, which can generally be categorized into temporal interpolation, temporal filtering, temporal function-fitting, temporal deep learning, frequency-based methods, and hybrid techniques [30][31][32].
Given their high computational speed and promising classification results, many machine learning and AI techniques have been used with remote sensing data for the mapping and monitoring of cropland. Some of the popular methods include random forest, neural networks, and support vector machines [33][34][35][36]. These algorithms typically learn about the target class characteristics from the training dataset and identify the various classes in the input dataset. The recent literature on this topic includes a study by Ketchum et al. [37], in which a method for the large-scale mapping of irrigated agricultural lands in Western U.S. using random forest machine learning was developed. Zhang et al. [38] have mapped croplands in China with a machine learning classifier on Google Earth Engine.
Conducted as part of the International Telecommunication Union (ITU) competition, the main objective of this study was to develop a robust, cost-effective, machine learning, and AI-based method for cropland and crop intensity mapping in four study areas: Iran, Mozambique, Sri Lanka, and Sudan. The competition tasks included training dataset creation, NDVI data time series reconstruction, and cropland extent mapping as well as crop intensity mapping. The contributions of this paper are summarized in the following points:

1.
Provide a detailed methodology of data points collection of Sentinel-2 satellite assets though google earth engine for cropland extent and intensity applications for machine learning.

2.
Perform a NDVI time-series reconstruction on the data-points collected to patch gaps and errors within the series using a Savitzky-Golay filter and linear interpolation technique.

3.
Develop an adaptive threshold approach for crop intensity cycles detection based on the obtained NDVI time-series.

4.
Apply different machine learning techniques on the dataset at hand to generate cropland extent and intensity maps.

5.
Compare context aware and regular machine learning techniques in terms of efficiency and speed.

Sentinel-2 Data
Sentinel-2 data were used as input for our cropland extent and intensity models. The European Space Agency oversaw the development of the Sentinel-2, which is a constellation of two Earth observation satellites, as a part of the Copernicus Earth observation program of the European Commission. Sentinel-2 satellites' wide-swath, multi-spectral imaging capabilities offer an unprecedented perspective of the Earth, encompassing all its landmasses, sizable islands, and waterways. Applications in forestry, agriculture, and other land management fields benefit greatly from Sentinel-2 data. For instance, it can be used to map forest cover and soils, investigate leaf area together with chlorophyll and water content, and monitor inland waterways and coastal regions [39]. Two identical satellites make up the Sentinel-2 mission: Sentinel-2A, which was launched on 23 June 2015, and Sentinel-2B, which was launched in 2017. The constellation can revisit each location on the surface of the Planet once every five days with both satellites launched. Each satellite is equipped with a Multi-Spectral Instrument (MSI), which creates photographs of the Earth with a resolution of ten meters per pixel, a field of view of 290 km, and thirteen bands spanning the visible and infrared spectrum [39].

Study Area and Dataset
The study region of this paper covers four different countries: Iran, Sudan, Mozambique, and Sri Lanka, as illustrated in Figure 1. The regions covered are fairly diverse in land cover types, including grassland, forests, water, bare soil, and asphalt; but, most importantly, they include ample agricultural land. The four regions differ in climate, with the region in Iran having a semi-arid climate, the region in Sudan having a hot desert climate, and lastly, the region in both Mozambique and Sri Lanka having a tropical climate. Consequently, the cropland captured in the study area's collected data is diverse in the type of crops included.
The datasets used in this paper consist of Normalized Difference Vegetation Index (NDVI) time series data of 10-m spatial resolution collected from a 15-day Sentinel-2 composite, which covers a region of 0.5 degrees by 0.5 degrees for each of the four studied countries. NDVI is a remote sensing index used to assess vegetation health and quantity [40]. NDVI also helps differentiate vegetated land from other land-cover types in imagery as it tends to have a positive value for pixels related to vegetated land and zero or a negative value for pixels pertaining to other types of land-cover, such as water [41]. Hence, NDVI was used in this work in order to perform both cropland and crop intensity mapping. NDVI relies on the red band and the Near-Infrared band and is calculated as follows, where NIR and red are the surface reflectance valeus of the Near-Infrared band and the Red band respectively: The geo-spatial assets were provided by the international AI for good and ITU Cropland mapping competition organizers. The exact type of crops mapped is relative to the study region. The tiles provided were later used to generate data points using Google earth engine by applying two different data collection approaches based on the problem requirement. For the problem of cropland extent, we collected 75 geometric points of cropland and 75 points of non-cropland using eye inspection for each of the four study regions. The geometric points were then appointed a binary label according to their class (cropland, non-cropland). As for the problem of crop intensity mapping, we used geometric polygons of water surfaces, cropland, and non-cropland areas to form our datasets. The polygons allowed us to collect many data points, ensuring that we captured data pertaining to all crop intensity classes, of which we only used 200 points. These data points were then appointed a label of zero crop cycles, one crop cycle, two crop cycles, or three crop cycles through the crop cycle counting algorithm described later in this paper with the crop cycle number referring to the number of complete crop growth cycles a land exhibited in one year. After appointing the labels, we dealt with any imbalances found in the datasets caused by under-representation of certain classes by oversampling these underrepresented classes. It is worth mentioning that the size of the datasets was restricted to a maximum of 500 training data points per dataset as per competition rules, with smaller datasets being preferred, hence the small size of the collected datasets for this paper. Each of these data-points represented a pixel-wise NDVI value recorded every two weeks which resulted in a time-series of 24 points in total.

Data Pre-Processing
This section details the dataset pre-processing and preparation procedure followed in this paper. Figure 2 shows the four study areas data collection for the cropland extent problem with 150 samples per country. Those 150 samples were divided into 75 cropland and 75 noncroplands. These samples are visualized in the figures below, with orange pins representing cropland areas and grey pins representing non-cropland areas.  Figure 3 illustrates the data collection process for the cropland intensity problem in Mozambique. This process was replicated for the three other study areas. Samples were collected using polygons to ensure comprehensive coverage of all four crop intensity classes and to guarantee sample variability within each class.

Cropland Extent and Intensity Samples Collection Procedure
The dataset used for the two problems of cropland extent and intensity mapping was collected from a gap-filled NDVI time series provided by the ITU competition organizers. Since training classifiers on an incomplete time-series negatively impacts their accuracy, it was necessary to first reconstruct the dataset's time series data, filling in the gaps where data were missing, before proceeding to use the dataset to train any classifier. Hence, to fill in the gaps, iterative interpolation for data reconstruction (IDR) and was used and missing data points were simply replaced with the average value of their adjacent points where sudden and drastic changes in NDVI values were noted. Further, linear interpolation was also applied when an NDVI value difference greater than 0.4, around half of the greatest NDVI value found in the collected dataset, was noted between two adjacent points where the second point was replaced to achieve a more natural rate of change in the NDVI scores of the time series data. This method was selected as the gap-filling procedure amongst other more complex options such as Fourier-based approach (Fourier), the double logistic model (DL), the Whittaker smoother (Whit), and the locally adjusted cubic spline capping approach (LACC) [42] given that the missing NDVI values were not expected to drastically deviate from their neighboring values, that is, following a typical vegetation growth cycle, and that such a replacement was not going to affect the mapping for which the classifiers are training. The time-series was further cleaned by applying a quadratic-polynomial-based Savitzky-Golay filter to the collected time series data, resulting in smoother NDVI time series curves. Given that the dataset's time series curves contained 24 points only, the filter was configured to have a sliding window of 3 points to not cause significant loss in the time series' curve characteristics. Figure 4 shows a sample time series from the collected dataset before (a) and after (b) applying the detailed reconstruction method. The flowchart in Figure 5 depicts said procedure.

Adaptive Threshold Approach for Crop Intensity Cycles Detection
While labelling for the cropland extent dataset was performed through visual interpretation, crop intensity labels were not determined as easily. Crop intensity was instead determined through an algorithm developed for the purpose of crop cycle counting. The developed algorithm traversed each time series in the dataset and found the total number of peaks in the time series curves. Based on the number of peaks found, a time series was assigned a label: A class 0 label when no crop cycles were detected, a class 1 label when only one crop cycle was detected, a class 2 label when two crop cycles were detected, and lastly, a class 3 label when three crop cycles were detected. The developed algorithm paid particular attention to eliminating spurious peaks from the crop cycle count. Spurious peaks mainly occur in time series data due to the growth of weeds in fields, which cause spikes in NDVI scores. These spikes are presented in NDVI time series curves as small peaks with values, much lower than that of cropland peaks which have NDVI scores typically above 0.5 [43]. When these peaks are not dealt with, the crop cycle count in a given time series is overestimated and hence the resulting crop intensity estimate made is inaccurate. The algorithm first traverses a given time series and finds the maximum NDVI score. If the highest NDVI score in the time series is found to be less than 0.3, the time series is immediately given the class 0 label, meaning it does not contain any crop cycle. This is because NDVI timeseries of such low NDVI scores cannot correspond with actively used cropland, which typically reflect higher NDVI scores, instead they are more likely to pertain to barren land, rocks, and water or sparse vegetation. Otherwise, two thresholds are created based on this value: a minimum value for thresholding peaks and a maximum value for valleys. These thresholds are used to find the number of crop cycles in each time series while making sure that spurious peaks caused by weeds are not included in the count. In the previous literature relating to crop intensity mapping, spurious peaks were removed from the crop cycle count by simply using a static threshold for the peak values, where peaks that fell below a specific NDVI or EVI score were considered spurious [15,43,44]. However, we have found that such an algorithm is insensitive to the fact that different types of crops differ in the NDVI scores they can attain during a crop cycle, assuming one threshold for all types of crops. This insensitivity can result in underestimating crop intensity. To deal with this problem, we introduced the concept of adaptive thresholds, where peaks and valleys in a given time series are accepted based on thresholding values tailored to that time series. In the case of our algorithm, we set the thresholding value for peaks in any time series to be 0.70 of the NDVI score of the absolute maxima in that time series, whereas the valley threshold was set to be 0.20 of that value. After finding the thresholds, the time series is traversed once more in order to find the number of crop cycles. A crop cycle is then considered to have occurred when a peak value that is encompassed between two valley values is found. In the special case where the peak value is found at an endpoint, a crop cycle is counted if and only if the peak is either lead or followed by a valley.

Classification Methods
There are primarily five machine learning algorithms deployed for the purpose of the binary classification of our sample points for the cropland extent problem as well as the categorical classification of our reconstructed NDVI time series for the crop intensity problem. The models used for this purpose are as follows:

Random Forest
Random forest is a classification algorithm that is built of multiple decision tress. Firstly, n records are selected at random from a data set with k records. Next, each sample's decision tree is built separately, and consequently, each decision tree produces an output. Finally, the model's output is based on the Majority Voting ensemble that combines the results from all the individual decision trees [45].

XGBoost Classifier
XGBoost is short for Extreme Gradient Boosting. It is a distributed gradient-boosted decision tree. The trees are boosted by parallel trees and are usually the leading machine learning algorithm for classification problems. Similar to random forest, XGBoost also employs decision trees as base learners. However, the trees used by XGBoost are CARTs (Classification and Regression trees) that contain real-value scores in each leaf node instead of a single decision [46].

LSTM
LSTM [47] is short for Long Short-Term Memory and is a special type of RNN (Recurrent Neural Network) capable of handling the vanishing gradient issue faced by RNN. It can process the entire sequence of data and remember long term dependencies. LSTM has feedback and has an input flow which can either be backwards or forwards. Equations (2) and (3) describe the encoder and decoder of an LSTM; where h t is the encoder's hidden state at time step t with input token embedding x t . On the decoder side, s t denotes the hidden state at time step t with input embedding token y t . In this study, we have constructed an LSTM model consisting of one LSTM layer of 50 units, followed by a dense layer.

Bidirectional LSTM
Bidirectional LSTM works similar to LSTM, but the main difference is that bidirectional can make input flows in both directions, backwards and forward, since it consists of two LSTMs: one which takes the input in the forward direction and the other taking it in the backward direction. With the help of BiLSTMs, the network has access to more information, which improves the context available to the algorithm [48]. In our study, we have constructed a Bi-LSTM model comprising two consecutive BiLSTM layers of 50 units each, followed by a dense layer.

KNN DTW
KNN is the K-nearest neighbors' algorithm. It is well known for classification and works by finding the distances between a value and other examples in the data and then chooses the most frequent label for classification. DTW stands for Dynamic Time Warping and is used in time series analysis. It measures the similarity between two temporal sequences. This approach is popular for time series classification given the algorithm's speed and scalability. Since we are dealing with the crop intensity problem involving NDVI time series classification, KNN with DTW seemed to be the more appropriate choice of algorithm for this specific problem [49].

Computational Complexity
As we are looking into developing a light-weight solution to cropland mapping, accuracy is not the only metric that should be taken into consideration, the computational complexity of the algorithms implemented should also be considered. In general, traditional machine learning algorithms are more efficient than ensemble methods and recurrent neural networks. The complexity of Random forest is the lowest among tested algorithms with a training time complexity of O(T × n × log(n) × m) and a space complexity as small as O(d × k) where T is the number of trees, n is the number of training samples, m is the number of features, d is the depth of tree, and k is the number of neighbors [50]. On the other hand, KNN-DTW follows a slightly more expensive algorithm with a training time complexity of O(k × n × m) and a space complexity of O(n × m) [51] while XGBoost follows a lower efficiency level of O(Kd||x || 0 logn) [52]. Finally, the time complexity of LSTM and Bidirectional LSTM relies on the number of edges in the network W with a time complexity of O(W) and O(2W), respectively [53]. In general, literature suggests that there is a tradeoff between accuracy and speed when it comes to generating cropland maps. Traditional machine learning algorithms are not as computationally expensive as RNNs and are thereby faster but have slightly less accurate results.

Results and Discussion
Our results in Table 1 show that traditional machine learning algorithms, such as Random Forest and XGBoost, performed well across all four test regions for cropland extent. XGBoost model achieved the highest average accuracy (88.5%) overall outperforming more computationally expensive recurrent neural network models such as LSTM and Bidirectional LSTM. In general, the test region of Iran resulted in the best performance where Bi-LSTM achieved 98% accuracy. On the other hand, Mozambique's test region was the most challenging as the highest accuracy achieved was 82% through both XGBoost and LSTM models.  Table 2 shows that the results obtained for cropland intensity mapping problem were consistent with those of cropland extent mapping, as XGBoost scored the highest average accuracy (87.3%) across all test regions and outperformed random forest model and KNN DTW model. Context aware models such as LSTM, Bidirectional LSTM, and KNN DTW seemed to perform better than context unaware algorithms.

Maps Generated as Google Earth Engine Assets
These results can be explained by the geographical nature of the four test regions. We believe that the presence of many forests in the region of Sri-Lanka (Figures 6a and 7a) reduced the model's performance as the corresponding time series of forests samples were like those of cropland areas. The region of Mozambique (Figures 6b and 7b) contained samples that were difficult to distinguish even through eye-inspection, while the region of Iran (Figures 6c and 7c) contained easily distinguished cropland or non-cropland areas with a high variance present in the corresponding time series. Additionally, the presence of a body of water such as the Nile River in the test region of Sudan (Figures 6d and 7d) could decrease the performance because it is fringed by gallery forests and herbaceous vegetation which can be misleading.  To test the inference time of the algorithms at hand, we measured the time needed to generalize the results on a 0.5 • by 0.5 • grid, which corresponds to about 55 km squared, in the four test regions.
As Table 3 shows, traditional machine learning methods seemed to be more timeefficient when applied on the full map with an average inference time of 4 min for Random Forest and less than a minute using XGBoost with a Python 3 Google Compute Engine backend. On the other hand, context aware models took longer time to perform the same task with approximately 1 h 30 min for the LSTM model using the same computational resources. These results are consistent with the theoretical computational complexity analysis of the algorithms discussed earlier. We believe that traditional machine learning algorithms are the optimum solution to operate on large grid maps given their relatively high accuracy, low computational cost, and fast inference time. Figure 8 illustrates the average inference time of each model with respect to the average accuracy.

Conclusions
In this study, we present a comprehensive pipeline for cropland extent and intensity mapping using lightweight machine-learning models. Time series samples were collected from Sentinel-2 data using Google Earth Engine and utilized to train various machine learning models for cropland and crop intensity classification after a series of preprocessing steps. The performance of traditional machine learning algorithms, such as Random Forest and XGBoost, was compared with context-aware methods like KNN-DTW and with Recurrent Neural Networks (LSTMs and Bi-LSTMs). The evaluation results of this study demonstrate that all algorithms achieved high accuracy, ranging between 84% and 98%, across four test regions: Iran, Sri Lanka, Sudan, and Mozambique. After evaluation, the obtained results were generalized over a complete map with a grid size of 0.5 • by 0.5 • , captured from 10-m spatial resolution data, after which the inference time of each algorithm was measured, and all models were compared. Based on our findings, we conclude that traditional machine learning algorithms offer a more efficient and lightweight solution for cropland mapping due to their relatively high accuracy and fast inference time.