Abstract
Satellite Earth observations provide timely and spatially explicit information on crop phenology that can support decision making and sustainable agricultural land management. Accurate classification and mapping of croplands is primary information for agricultural assessments. This study presents a digital agriculture approach that integrates Earth Observation big data analytics based on machine learning technologies to classify and map main crop types. Two supervised machine learning models were calibrated using the Random Forest algorithm from phenological metrics, estimated from time series of NDVI and LAI vegetation indices calculated using Sentinel-2 MSI satellite acquisitions. Models were calibrated for the Toscana region in Italy. The results show a satisfactory overall accuracy (~78%) in cropland classification, and the model calibrated using LAI time series performed slightly better than the model calibrated using NDVI time series. The proposed approach offers the potential to accurately map crop types in a way that is useful to support agricultural land management and monitoring systems for large areas over time.
1. Introduction
Cropland mapping is becoming increasingly important in environmental topics which deal with sustainable agriculture production and natural resource management [1]. Nowadays, a wide number of stakeholders are interested in this topic, such as national authorities, local environmental agencies, regional government and authorities, municipalities, universities and research centers, civil protection agencies, insurance companies, and industries. Cropland mapping products answer the information need deeply felt by users in response to the growing interest shown by the European policies in climate change mitigation and adaptation, and foster sustainable agricultural practices, especially today in the context of the European Green Deal strategy [2].
The information provided by increasing availability of Earth observation (EO) data makes satellite images of paramount importance for identifying, characterizing, and mapping crop typologies in both the space and time dimensions by exploiting the radar backscatter and the optical response of vegetation [3,4]. The commitment by the European Commission (EC) to encourage the development of EO products, possibly taking advantage of Copernicus in situ Component, makes value-added information derived from satellites of primary importance for supporting agricultural land management. Indeed, the EC has finally sanctioned the use of Copernicus Sentinel data, integrated with EGNOS/Galileo, for the control and granting of Common Agricultural Policy (CAP) payments by local authorities, promoting open data with a common data-sharing approach (Regulation (EU) 746/2018).
Multitemporal satellite images have proven to be successfully used to estimate vegetation’s biophysical parameters and to identify phenological patterns [5,6,7]. Recently, the Copernicus Sentinel-2 satellite constellation, equipped with an MSI sensor, was able to sense the Earth’s surface at high spatial, spectral, and temporal resolutions, showing its potential for the estimation of vegetation parameters, such as phenological metrics (e.g., the start of season, the length of season, or the end of season) [7,8].
Many authors have investigated the efficacy of spectral and biophysical time series indices to differentiate crop types [9,10]. Vegetation spectral indices have been and are still widely used to detect the status of vegetation (e.g., growth, health, and cover), the most popular of which is the normalized difference vegetation index (NDVI) [11]. However, NDVI has saturation as its limit at high values. On the other hand, vegetation’s biophysical characteristics, such as the canopy structure and photosynthetic capacity, are well described by the leaf area index (LAI) largely used in agricultural studies in heterogeneous smallholder and fragmented agroecosystems [12,13].
Furthermore, the advances in analytical techniques, such as machine learning algorithms, enable us to deal with fast and robust analyses applied to big data. Among these, Random Forest (RF) is an ensemble learning classifier that has been successfully used in vegetation classification applications, including crop mapping [7,10,14].
The aim of this study was to present a digital agriculture approach that integrates EO big data analytics, based on a supervised machine learning model using temporal statistics and phenological metrics estimated from NDVI and LAI time series as predictors, to identify and map the main crop types. The performance of two supervised machine learning models, calibrated using the RF algorithm for a study area in central Italy, are presented and discussed.
2. Materials and Methods
2.1. Study Area
Tuscany is located in central Italy and covers about 23,000 square kilometers. The climate ranges from the Mediterranean dry climate along the coastline to the temperate humid and wet climate in the inland and northern areas of the region. Tuscany is mainly hilly (about 67%) and mountainous (about 25%), and it also includes some plains (about 8%). The cultivated areas represent about 39% of the region, mainly characterized by arable land, vineyards, and olive groves.
2.2. Satellite Images
Sentinel-2 (S2) satellites images, acquired from November 2015 to October 2019 with cloud cover lower than 90%, were acquired for the 4 granules corresponding to the study area. The Multi-Spectral Instrument (MSI) sensor onboard S2 is characterized by a high spatial resolution (10 m, 20 m, and 60 m), a high revisit time (5 days with two satellites), and 13 spectral bands from visible to shortwave infrared. The spectral bands of the images in the MUSCATE format, distributed by Theia as the bottom of the atmosphere (BOA) reflectance, orthorectified, terrain-flattened, and atmospherically corrected with the MACCS-ATCOR joint algorithm (MAJA) [15], were processed for spatial resampling at 10 m masked for invalid pixels (cloud, cloud_cirrus, cloud_shadow, topographic_shadow, snow, edge, sun_too_low). A static mask, generated from Copernicus Land Monitoring Service datasets, was applied to mask out pixels not corresponding to croplands.
2.3. Crop Type Maps
The reference crop type maps used in this study were made available by the Tuscany Regional Agency for Agriculture (http://dati.toscana.it/organization/artea, accessed on 17 October 2020) for the years from 2016 to 2019. This study focused only on the main crop types of the arable land, excluding permanent crops such as vineyards and olive groves. Selected crop typologies were grouped into 8 classes, taking the temporal pattern of the crops in the study area into account: winter cereals, clover and alfalfa, maize, sorghum, sunflower, rapeseed, horticultural crops, and soy. The centroid of each crop parcel polygon in the reference maps was used to query the raster predictors generated from the satellite images.
2.4. Time Series and Temporal Predictors
Two vegetation indices were selected to derive the main crop types in the study area: the NDVI and the LAI. The NDVI was calculated following Equation (1):
where RED corresponds to the S2 MSI spectral band B4 and NIR corresponds to the S2 MSI spectral band B8. The Leaf Area Index (LAI) is defined as half of the total green (i.e., photosynthetically active) leaf area per unit of horizontal ground surface area. The biophysical processor [16] available in SNAP software was used to estimate the LAI from the surface reflectance data.
The time series of the vegetation indices were first gap-filled and interpolated daily using the Stinemann algorithm [17], and later temporally smoothed using the procedure based on second-order weighted polynomial fitting and Whittaker smoothing, as described in [18]. From the NDVI and LAI time series, temporal statistics and phenological metrics, derived following Gu et al. [19], were calculated and used as temporal predictors in the classification model (Table 1).
Table 1.
List of time series statistics and phenological metrics used as model predictors. The predictors’ importance resulting from the classification model is expressed as the Gini index value. Variables with no important values were not selected as model predictors. The abbreviation ‘dl’ stands for ‘dimensionless’.
All predictors with a Pearson correlation coefficient higher than 0.9 and a variance inflation factor (VIF) higher than 2.0 [20] were removed to avoid multi-collinearity.
2.5. Random Forest Classification
The R package ‘mlr’ [21] was used to set the RF hyperparameter combination (i.e., mtry, min.node.size, ntree) through a 5-fold cross-validation with 20 repetitions and selected those with a higher Cohen’s kappa coefficient. The tuned hyperparameters were used to calibrate the classification models from the NDVI and LAI predictors using the R package ‘ranger’ [22]. The variables’ importance for the final set of selected predictors used in the models was calculated using the Gini index.
A stratified sampling method was applied to the crop type reference map of the year 2019 in order to select the pixels which represented all 8 classes of crop types and could be used as training samples for the classification and as test samples to verify the accuracy of the classification obtained. Here, 70% of the pixels were used as training samples and the remaining 30% as the test samples.
The results of the classifications obtained were evaluated by means of confusion matrices according to the test samples. Overall accuracy (OA), producer’s accuracy (PA), user’s accuracy (UA), and Cohen’s kappa coefficient (K) were assessed.
Finally, the crop type map product for the year 2019 was predicted using the calibrated supervised machine learning models.
3. Results
The RF hyperparameter tuning produced the following settings: mtry = 5, min.node.size = 2, ntree = 893 for NDVI, and mtry = 4, min.node.size = 3, ntree = 424 for LAI. The selected predictor variables reporting the highest Gini index were 13 for NDVI and 11 for LAI (Table 1).
The resulting spatial crop type map is shown in Figure 1. Regarding the classification obtained from the NDVI time series analysis, an overall accuracy of 78.6% was achieved with a Cohen’s kappa coefficient of 0.54 (Table 2). Some classes were more accurately classified than others, such as clover and alfalfa (UA = 91.1%; PA = 82.8%), maize (UA = 69.5%; PA = 58%), and winter cereals (UA = 55.9%; PA = 69.2%). On the contrary, sorghum was the worst classified (UA = 6%; PA = 26.8%). Rape and soy obtained low user’s accuracy (17.2% and 17.6% respectively).
Figure 1.
Crop type map for the year 2019 for the area of the city of Pisa (Tuscany, Italy).
Table 2.
Confusion matrix of the RF results from the NDVI time series analysis. Producer’s (PA), user’s (UA), and overall (OA) accuracies (as percentages), as well as the Cohen’s kappa coefficient (K) are reported.
As for the classified crop types resulting from the LAI time series analysis, an overall accuracy of 78.3% was achieved, with a Cohen’s kappa coefficient of 0.59 (Table 3). Unlike the NDVI model results, the LAI model generally showed high user’s and producer’s accuracies for all the classes, except for rapeseed (UA = 10.7%), and the misclassification was principally with winter cereals.
Table 3.
Confusion matrix of the RF results from the LAI time series analysis. Producer’s (PA), user’s (UA), and overall (OA) accuracies (as percentages), as well as the Cohen’s kappa coefficient (K) are reported.
4. Discussion
The capacity to map crop types using phenological metrics with a high spatial resolution has been demonstrated in this research study for a heterogeneous, small, and fragmented agricultural system. Multi-temporal information has been demonstrated to increase the crop type classification’s accuracy significantly [7]. In the context of crop type mapping and the monitoring of agricultural practices, synthesizing information to fewer phenological metrics facilitates image data processing by reducing the time series’ dimensionality [18].
Azar et al. [1] analyzed the performance of crop classification from multi-temporal Landsat 8 OLI images over a study area in Northern Italy. Four supervised classification algorithms applied to the spectral indices’ profiles were tested over different time step datasets to assess the performance of in-season crop classification in the year 2013. The result was a crop type map with seven classes with OA = 86.5% that was produced five months ahead of the end of season, in the middle of July.
Many studies have confirmed crop classifications with a high accuracy (OA = 82%) for eight crop types [4] and mapped cropland status (cropped or fallowed) with accuracies over 75% [23]. The crop recognition method can lose accuracy, especially when the mapped crops have high intra-class variability [10] or when insufficient knowledge of the field data relating to the phenological cycles of the crops is available [24,25].
Despite the overall accuracies and Cohen’s kappa coefficient being similar for both the NDVI and the LAI model, comparing the results for individual classes, the latter showed slightly higher performance.
Veloso et al. [26] worked toward crop classification (maize, soybean, sunflower) using the temporal profile of NDVI and radar backscatter (VH, VV, and VH/VV). Regarding the classification of the crop, they concluded that NDVI shows low ability to distinguish summer crops, except for sunflower, during the senescence period in August and September. Besides, during periods of strong cover development, NDVI’s sensitivity to biomass is more likely to become saturated. High misclassifications of horticultural crops may be related to the different seeding times of the horticultural species, which could increase the variability in terms of the range of the predictors’ values. With respect to soy and rapeseed, it should be noted that the small number of reference crops used for model calibration and validation could be the reason for such a low class accuracy.
Mueller-Warran et al. [27] outlined that although converting multi-year land-use data into a crop rotation history is relatively simple in theory, the presence of classification errors can severely compromise the results. Given this fact, they proposed using a matrix of logically forbidden or extremely unlikely year-to-year land use transitions to detect classification errors. Likewise, the use of a priori knowledge of the local rotation practices could be a research avenue for improving crop type identification by constraining the classification models. Future research should consider the use of extended crop type information (e.g., the LUCAS Soil DataBase) in order to increase the number of classification model training points and therefore improve the overall accuracy.
5. Conclusions
The study demonstrated the EO big data analytics’ capacity to provide thematic products to support agricultural land management and fulfill the users’ requirements. The phenological metrics estimated from high-resolution imagery sensed by the Copernicus S2 satellite constellation, combined with a thematic reference dataset related to crop types, together with the use of advanced computational analytic techniques (the RF algorithm), allowed crop type mapping in heterogeneous, small, and fragmented agricultural systems. The calibrated NDVI and LAI supervised machine learning models showed similar performance, with the LAI model yielding better results.
The supervised machine learning model, applied to a wider spatial extent, could contribute to the measurement and assessment of sustainability foreseen by the European Green Deal strategy, in terms of sustainable agricultural practices and environmental monitoring, and climate change mitigation and adaptation, in accordance with the stakeholders’ requirements.
Author Contributions
Conceptualization, F.F. and D.S.; methodology, F.F.; formal analysis, F.F.; investigation, D.S.; writing—original draft preparation, D.S. and F.F.; writing—review and editing S.M. and A.T.; supervision, A.T. All authors have read and agreed to the published version of the manuscript.
Funding
The research was funded by Italian Space Agency (ASI) in the framework of the agreement between ASI and the Italian Institute for Environmental Protection and Research (ISPRA) on “Air Quality” (Agreement number F82F17000000005).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Publicly available datasets were analyzed in this study. This data can be found here: [https://www.theia-land.fr/en/product/sentinel-2-surface-reflectance/, accessed on 17 October 2020], [http://dati.toscana.it/organization/artea, accessed on 17 October 2020].
Acknowledgments
This work contains modified Copernicus Sentinel data and Copernicus Service information (2021).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Azar, R.; Villa, P.; Stroppiana, D.; Crema, A.; Boschetti, M.; Brivio, P.A. Assessing in-season crop classification performance using satellite data: A test case in Northern Italy. Eur. J. Remote Sens. 2016, 49, 361–380. [Google Scholar] [CrossRef] [Green Version]
- Taramelli, A.; Tornato, A.; Magliozzi, M.L.; Mariani, S.; Valentini, E.; Zavagli, M.; Costantini, M.; Nieke, J.; Adams, J.; Rast, M. An interaction methodology to collect and assess user-driven requirements to define potential opportunities of future hyperspectral imaging sentinel mission. Remote Sens. 2020, 12, 1286. [Google Scholar] [CrossRef] [Green Version]
- Inglada, J.; Arias, M.; Tardy, B.; Hagolle, O.; Valero, S.; Morin, D.; Dedieu, G.; Sepulcre, G.; Bontemps, S.; Defourny, P.; et al. Assessment of an Operational System for Crop Type Map Production Using High Temporal and Spatial Resolution Satellite Optical Imagery. Remote Sens. 2015, 7, 12356–12379. [Google Scholar] [CrossRef] [Green Version]
- Van Tricht, K.; Gobin, A.; Gilliams, S.; Piccard, I. Synergistic Use of Radar Sentinel-1 and Optical Sentinel-2 Imagery for Crop Mapping: A Case Study for Belgium. Remote Sens. 2018, 10, 1642. [Google Scholar] [CrossRef] [Green Version]
- Weissteiner, C.J.; López-Lozano, R.; Manfron, G.; Duveiller, G.; Hooker, J.; van der Velde, M.; Baruth, B. A Crop Group-Specific Pure Pixel Time Series for Europe. Remote Sens. 2019, 11, 2668. [Google Scholar] [CrossRef] [Green Version]
- Gao, F.; Anderson, M.C.; Hively, W.D. Detecting Cover Crop End-Of-Season Using VENµS and Sentinel-2 Satellite Imagery. Remote Sens. 2020, 12, 3524. [Google Scholar] [CrossRef]
- Vuolo, F.; Neuwirth, M.; Immitzer, M.; Atzberger, C.; Ng, W.-T. How much does multi-temporal Sentinel-2 data improve crop type classification? Int. J. Appl. Earth Obs. Geoinf. 2018, 72, 122–130. [Google Scholar] [CrossRef]
- Vrieling, A.; Meroni, M.; Darvishzadeh, R.; Skidmore, A.K.; Wang, T.; Zurita-Milla, R.; Oosterbeek, K.; O’Connor, B.; Paganini, M. Vegetation phenology from Sentinel-2 and field cameras for a Dutch barrier island. Remote Sens. Environ. 2018, 215, 517–529. [Google Scholar] [CrossRef]
- Djamai, N.; Fernandes, R.; Weiss, M.; McNairn, H.; Goïta, K. Validation of the Sentinel Simplified Level 2 Product Prototype Processor (SL2P) for mapping cropland biophysical variables using Sentinel-2/MSI and Landsat-8/OLI data. Remote Sens. Environ. 2019, 225, 416–430. [Google Scholar] [CrossRef]
- Belgiu, B.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
- Rouse, J., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS. In Proceedings of the Third ERTS-1 Symposium, Washington, DC, USA, 10–14 December 1973; NASA: Washington, DC, USA, 1974; pp. 309–317. [Google Scholar]
- Lambert, M.J.; Traoré, P.C.S.; Blaes, X.; Baret, P.; Defourny, P. Estimating smallholder crops production at village level from Sentinel-2 time series in Mali’s cotton belt. Remote Sens. Environ. 2018, 216, 647–657. [Google Scholar] [CrossRef]
- De Peppo, M.; Dragoni, F.; Volpi, I.; Mantino, A.; Giannini, V.; Filipponi, F.; Tornato, A.; Valentini, E.; Nguyen Xuan, A.; Taramelli, A.; et al. Modelling the ground-LAI to satellite-NDVI (Sentinel-2) relationship considering variability sources due to crop type (Triticum durum L., Zea mays L., and Medicago sativa L.) and farm management. In Proceedings of the SPIE Remote Sensing for Agriculture, Ecosystems, and Hydrology XXI, Strasbourg, France, 9–11 September 2019; SPIE Press: Strasbourg, France, 2019; Volume 11149, p. 111490I. [Google Scholar] [CrossRef]
- Lebourgeois, V.; Dupuy, S.; Vintrou, É.; Ameline, M.; Butler, S.; Bégué, A. A Combined Random Forest and OBIA Classification Scheme for Mapping Smallholder Agriculture at Different Nomenclature Levels Using Multisource Data (Simulated Sentinel-2 Time Series, VHRS and DEM). Remote Sens. 2017, 9, 259. [Google Scholar] [CrossRef] [Green Version]
- Hagolle, O.; Huc, M.; Desjardins, C.; Auer, S.; Richter, R. MAJA Algorithm Theoretical Basis Document. Available online: https://doi.org/10.5281/zenodo.1209633 (accessed on 7 December 2017).
- Weiss, M.; Baret, F. S2 ToolBox Level 2 Products: LAI, FAPAR, FCOVER. 2016. Available online: https://step.esa.int/docs/extra/ATBD_S2ToolBox_L2B_V1.1.pdf (accessed on 31 March 2019).
- Stineman, R.W. A consistently well behaved method of interpolation. Creat. Comput. 1980, 6, 54–57. [Google Scholar]
- Filipponi, F.; Smiraglia, D.; Agrillo, E. Earth Observation for Phenological Metrics (EO4PM): Temporal discriminant to characterize forest ecosystems. Remote Sens. 2022, 14, 721. [Google Scholar] [CrossRef]
- Gu, L.; Post, W.; Baldocchi, D.; Black, T.; Suyker, A.; Verma, S.; Vesala, T.; Wofsy, S. Characterizing the seasonal dynamics of plant community photosynthesis across a range of vegetation types. In Phenology of Ecosystem Processes; Noormets, A., Ed.; Springer: New York, NY, USA, 2009; pp. 35–58. ISBN 978-1-4419-0026-5_2. [Google Scholar] [CrossRef] [Green Version]
- Zuur, A.F.; Ieno, E.N.; Elphick, C.S. A protocol for data exploration to avoid common statistical problems. Methods Ecol. Evol. 2010, 1, 3–14. [Google Scholar] [CrossRef]
- Bischl, B.; Lang, M.; Kotthoff, L.; Schiffner, J.; Richter, J.; Studerus, E.; Casalicchio, G.; Jones, Z.M. mlr: Machine Learning in R. J. Mach. Learn. Res. 2016, 17, 1–5. [Google Scholar] [CrossRef]
- Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Wallace, C.S.A.; Thenkabail, P.; Rodriguez, J.R.; Brown, M.K. Fallow-land Algorithm based on Neighborhood and Temporal Anomalies (FANTA) to map planted versus fallowed croplands using MODIS data to assist in drought studies leading to water and food security assessments. GISci. Remote Sens. 2017, 54, 258–282. [Google Scholar] [CrossRef] [Green Version]
- Mingwei, Z.; Qingbo, Z.; Zhongxin, C.; Jia, L.; Yong, Z.; Chongfa, C. Crop discrimination in Northern China with double cropping systems using fourier analysis of time-series MODIS data. Int. J. Appl. Earth Obs. Geoinf. 2008, 10, 476–485. [Google Scholar] [CrossRef]
- Wardlow, B.; Egbert, S.; Kastens, J. Analysis of time-series MODIS 250 m vegetation index data for crop classification in the U.S. Central great plains. Remote Sens. Environ. 2007, 108, 290–310. [Google Scholar] [CrossRef] [Green Version]
- Veloso, A.; Mermoz, S.; Bouvet, A.; Le Toan, T.; Planells, M.; Dejoux, J.-F.; Ceschia, E. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
- Mueller-Warrant, G.W.; Sullivan, C.; Anderson, N.; Whittaker, G.W. Detecting and correcting logically inconsistent crop rotations and other land-use sequences. Int. J. Remote Sens. 2016, 37, 29–59. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).