AgroShadow: A New Sentinel-2 Cloud Shadow Detection Tool for Precision Agriculture

: Remote sensing for precision agriculture has been strongly fostered by the launches of the European Space Agency Sentinel-2 optical imaging constellation, enabling both academic and private services for redirecting farmers towards a more productive and sustainable management of the agroecosystems. As well as the freely and open access policy adopted by the European Space Agency (ESA), software and tools are also available for data processing and deeper analysis. Nowadays, a bottleneck in this valuable chain is represented by the difﬁculty in shadow identiﬁcation of Sentinel-2 data that, for precision agriculture applications, results in a tedious problem. To overcome the issue, we present a simpliﬁed tool, AgroShadow, to gain full advantage from Sentinel-2 products and solve the trade-off between omission errors of Sen2Cor (the algorithm used by the ESA) and commission errors of MAJA (the algorithm used by Centre National d’Etudes Spatiales/Deutsches Zentrum für Luft- und Raumfahrt, CNES/DLR). AgroShadow was tested and compared against Sen2Cor and MAJA in 33 Sentinel 2A-B scenes, covering the whole of 2020 and in 18 different scenarios of the whole Italian country at farming scale. AgroShadow returned the lowest error and the highest accuracy and F-score, while precision, recall, speciﬁcity, and false positive rates were always similar to the best scores which alternately were returned by Sen2Cor or MAJA.


Introduction
The Sentinel-2 Multi-Spectral Imager (MSI) instruments deliver a remarkable amount of global data with high spatio-temporal resolution (10-20-60 m with a revisit time of 5 days in cloud-free conditions) and spectral sampling, essential for numerous operational applications such as land monitoring and risk assessment [1].
Nowadays agriculture is strongly influenced by technology and the availability of reliable and high-quality data can optimize production, maximizing profits [2]. In particular, agricultural systems can take advantage of information of Sentinel-2 data, detecting variations in soil properties and crop yield, and improving more sustainable cropping practices (i.e., water management, manuring and fertilizer application) [3][4][5][6]. To these purposes, however, a reliable detection and discrimination of clouds/cloud shadows is crucial, and the availability of free and open access big data has prompted to people provide end users with easy and ready to use products and tools to automate the processes of atmospheric correction and cloud/cloud shadow masking [7][8][9][10].
Many discrimination methods and approaches were developed in the past years both for low and high-resolution remote sensing images [11][12][13]. Some of them are focused on shadows casted by ground features as building or trees (especially for high resolution images) [14,15], other on topographic shadows [16,17], lastly on clouds classification [18][19][20][21].
Mostafa [12] and Shahtahmassebi et al. [22] did a review of several detection and deshadowing methods for all the three categories. Hollstein et al. [23] made a comparison between several classification techniques based on machine learning, among which were decision trees, Random Forest and Bayesian, whereas [24] developed the Spatial Procedures for Automated Removal of Cloud and Shadow (SPARCS) using a Neural Network approach.
Cloud shadow masking can be even more challenging. Shadows, in fact, can create misleading reflectance signals, as they can be casted over surfaces of similar spectral signatures (e.g., dark soils, wetlands, or burned vegetation) [24][25][26], or by thin clouds with soft boundaries [23]. To date, most of the automatic cloud shadows classification tools are based on geometry-identification methods [26][27][28] by thresholding of a single spectral band, reflectance differences or ratio, or derived indices (e.g., Normalized Difference Vegetation Index-NDVI or snow indices).
Different services have developed tools to process Level2 products for Sentinel-2, including cloud/cloud shadow masks. Sen2Cor [29,30], provided by the European Space Agency (ESA), and MAJA [31,32], provided by Theia Data Center, are two of the most largely employed examples. The main difference among them is the single-date approach used by Sen2Cor for cloud detection and the multi-temporal approach used by MAJA. Algorithms performances of Sen2Cor and MAJA, together with Fmask [10,26], were compared by [25,33,34], revealing a quite good overall accuracy. However, the evaluations of omission/commission errors were conducted considering a complete scene or a portion of a few km.
In this paper we present a novel tool for detecting cloud shadows from Sentinel-2 imagery at farming scale: the AgroShadow tool. The implementation of this tool is explained by the need for reducing misclassifications for precision farming applications over different agroclimatic and orographic areas. Three main advantages make AgroShadow an easy-to-use tool for shadow detection: (1) the tool is only based on the threshold method, avoiding clouds' location and solar geometry definitions; (2) the field scale requires low computational efforts; (3) the tool provides a cloud shadow mask that can be integrated into any other classifier.
The AgroShadow tool is primarily based on the modified-OPtical TRApezoid Model (OPTRAM) soil moisture index [35], based on Short Wave InfraRed (SWIR-B12) band and NDVI of Sentinel-2 MSI. To evaluate the robustness of the AgroShadow tool and to identify environments where the shadow detection is eventually critical, we compared its performance against manually classified areas, selected from Sentinel-2 scenes over different geographic areas of Italy, with different shadow conditions and covering all seasons. We also tested the accuracy of cloud shadow classifications at field scale made by Sen2Cor and MAJA tools, to verify if they can be substantially improved by the AgroShadow tool.

Study Area and Data Retrieval
Eighteen locations were selected across the Italian territory (Figure 1), characterized by different types of climatic conditions (from hot and dry to more humid areas) and by morphology, including plains, hills and steep slopes.
To assess the effectiveness of the cloud shadow tool for agricultural applications, we selected fields with different dimensions (among 30 and 200 ha). For each location (Table 1), from 1 to 3 satellite imageries throughout 2020 were downloaded including several types of cloud shadows (with soft/clear boundaries, related to thin clouds, low cumulus, etc.), soil moisture, crops (cereals, rice, mixed crops, etc.), vegetation growing status, irrigation practices (rainfed, irrigated, flooded) and different land covers.
Sentinel-2 imageries were downloaded both from the Copernicus Open Access Hub (https://scihub.copernicus.eu/, accessed on 24 November 2020) and the Theia-Land Data Center (https://theia.cnes.fr/, accessed on 26 November 2020). From the Copernicus Open Access Hub we selected the following Level-2A Bottom Of Atmosphere (BOA) reflectance data and products: (i) 10m channels B4 and B8, representing, respectively, the Red and Near InfraRed surface reflectance used to calculate NDVI; (ii) 20 m channel B12 (SWIR reflectance) used, together with the NDVI, for soil moisture modeling and shadow mask; (iii) 10m True Color Image (TCI) composite, necessary to manually identify shadows samples for the selected fields ( Figure 2); (iv) 10m channel B2 (blue reflectance) and 20 m channel B11 (SWIR reflectance) used for discriminating soil from water; (v) 20 m Scene CLassification (SCL) map to compare shadows identified by our tool to those of Sen2Cor.
As additional comparison, we downloaded 20 m CLoud Mask (CLM) and MG2 files from the Theia-Land Data Center.

The AgroShadow Tool
The AgroShadow detection tool relies on the modified-OPTRAM model, implemented for estimating soil water content [35]: where W is soil moisture, STR is SWIR (band 12, Sentinel 2) Transformed Reflectance, calculated as: and where i d and s d , i w and s w are, respectively, the dry and wet edges parameters of exponential function of the model, depending on the STR-NDVI pixel distribution.
The processing chain, launched on fields defined on-the-fly by users, includes a series of checks based on thresholds on reflectance ratio and indices, and k-means classification algorithm, essential to avoid misclassifications. The adopted criteria consist of: • a threshold of B2/B11 < 1.5, to discriminate soil from water pixels; • a k-means for classifying soil moisture values; • a classified value ≤0 is detected as cloud; • a classified value ≥1 is stated as possible shadow, snow or flooded condition; • a threshold of TCI > 200, to distinguish snow pixels from shadows and flooded condition; • a 5-pixel buffer neighbouring the detected area with a soil moisture threshold >0.6 is stated as flooded condition; • a 5-pixel buffer neighbouring the detected area with a soil moisture threshold ≤0.6 is stated as shadow.
Pixels turning out to be shadow are classified as NoData and not displayed on the map. These checks disentangle the shadows classification from their geometric relation with clouds and sun position, allowing the processing and classification of small portions of land.

Sen2Cor Classification
The SCL algorithm of the Sen2Cor tool [29,30] classifies pixels in 12 possible classes (https://dragon3.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm, accessed on 10 February 2021): unclassified pixels (with cloud-low-probability), three types of clouds (cloud-medium-probability, cloud-high-probability, thin cirrus), two types of shadows (dark area and cloud shadow), snow, vegetation, not-vegetated, water, saturated or defective pixels, no data (Table S5 of the Supplementary Materials). The algorithm consists of a threshold-filtering method applied to the Top Of Atmosphere (TOA) reflectance of Level-1C spectral bands, bands ratio, and indices. Once the clouds map is defined, the cloud shadow mask is obtained integrating the "radiometric" identification of potential cloud shadows from dark areas [36] by their spectral signatures and the "geometrically probable" cloud shadows defined by the final cloud mask, sun position and distribution of the top-cloud height. Pixels are classified as "cloud shadow" after several steps of threshold filtering.

MAJA Classification
The latest version of the MAJA tool for clouds and cloud shadows detection is based on an update of the original Multi-Temporal Cloud Detection (MTCD) method described by [32]. The MTCD method compares a reference composite image containing the most recent cloud-free pixels with the latest image in order to identify, throughout thresholds on several reflectance bands, possible cloudy pixels by an increase in reflectance in time. If the time between the reference image and the last image to be processed is too long, a mono-temporal cloud mask is also defined. A time correlation test of neighborhood pixels is also made. The comparison is not performed at full resolution to reduce computational time and avoid misclassifications. Once cloud mask is available, the same multi-temporal and thresholds concepts were used to identify the darkening of pixels by cloud shadows. The procedure generates a "geometric" and a "radiometric" cloud shadow mask, the latter especially used to identify shadows casted by clouds outside the image. The final clouds/cloud shadows classification mask (CLM), released by the Theia-Land Data Center with 10-20 m resolutions, defines classes by a set of binary bits (https://labo.obs-mip. fr/multitemp/sentinel-2/theias-sentinel-2-l2a-product-format/, accessed on 10 February 2021): all clouds except the thinnest and all shadows; all clouds (except the thinnest); clouds detected via mono-temporal thresholds; clouds detected via multi-temporal thresholds; thinnest clouds; cloud shadows cast by a detected cloud; cloud shadows cast by a cloud outside image; high clouds detected by 1.38 µm (Table S3 of the Supplementary Materials). The Theia-Land Data Center also distributes a "geophysical mask" (MG2) where the two shadow classes of CLM are grouped in a single class and a topographic shadows class is included (Table S4 of the Supplementary Materials).

AgroShadow Tool Validation
Validation of the AgroShadow tool consists of comparing shadow-masked areas identified by our tool with reference shadow polygons visually recognized on TCI band composition, both for each area and date.
The performance of the shadow mask methodology is evaluated throughout confusion matrices. A first check (Figure 2a) is made for the shadow/no shadow classifier, additionally considering the pixels of other classes that can induce misclassifications due to fog or used as test-sites for bright land covers (i.e., concrete and snow).
The rate of classification for each class concerning the other classes shows very good values ( Table 2). The true positive rate of the no shadow correctly predicted pixels has a recall of 98.82%, with most misclassifications occurring between no shadow and shadow classes and for crop fields coated by light fog (false positives), which is not recognized (see Avezzano (FOG)-T33TUG field, on 5 April 2020 in the Supplementary Materials, p. 13). The shadow predicted pixels have a true positive rate recall of 70.49%, with omission errors (false negative) referred to the no shadow class slightly higher than the commission errors (false positive).
In Figure 2b the multiclass confusion matrix includes particular conditions over vegetated areas, i.e., flooded rice fields and foggy alluvial plains. In this case, also, the true positive rate is good, with a recall of 89.01% for vegetated pixels. Misclassification between shadow and vegetated classes has an omission error higher than the commission error. The false positive rate for incorrectly classified pixels in the shadow class is due to the particular soil condition. In fact, the field is a rice paddy with a rotation flooding system. Before the seeding, when the bare soil of a parcel is flooded turning its color from light to dark brown, this creates a sharp contrast with the nearer parcels that is wrongly confused with a shadow (see Vercelli-T32TMR field, on 14 April 2020 in the Supplementary Materials, p. 20).

Comparison with Sen2Cor and MAJA Tools
The comparison between the performances of AgroShadow, Sen2Cor and MAJA shadow masking methods is made through binary shadow/no shadow confusion matrices (Figure 3). Compared to the other tools, regarding on the whole shadow and no shadow classes, the AgroShadow rate of classification is extremely good, even if the true positive rate of shadow class of MAJA-CLM and no shadow class of Sen2Cor have higher recalls. Additionally, analyzing the overall commission/omission errors, the AgroShadow tool generally shows lower values than the other tools ( Figure 3).
In particular, even though MAJA-CLM is able to detect almost all shadow pixels, with a recall of 97.97%, its false positives (red box of Figure 3c) are clearly higher than AgroShadow misclassifications (upper-right pink box of Figure 3a), especially for particular soil conditions, such as flooded rice fields and alluvial plains (Supplementary Materials, Table S2). This high commission error is due to the lower resolution of the classification process [30] and a misinterpretation of areas with a sharp reflectance decrease due to a sudden or strong modification in soil moisture or crop management. On the contrary, Sen2Cor (darker pink box of Figure 3b) misses more shadow pixels (lower-left pink box of Figure 3a), most of them wrongly classified as vegetation, dark area (representing topographic shadows) or unclassified (Supplementary Materials, Table S1). Finally, MAJA-MG2 is the tool with an overall quite high shadow misclassification (Figure 3d). Metrics of shadow classifiers (Table 3) confirm the validity of the AgroShadow tool for applications at farming scale, strongly reducing the loss of information, which is essential for precision farming practices. The precision, recall, specificity, and false positive rates are very good, with values quite similar to the best scores; the error has the lower value, and accuracy and F score are the highest. Even the number of completely missed classifications is contained, as for MAJA-CLM. Additional results are reported in Tables S1 and S2 of the Supplementary Materials. Table S1 compares false negative (missing shadows) classifications of the three tools based on nine Sen2Cor classes, whereas Table  S2 is focused on the correct/incorrect classification of four scenes related to particular conditions: fog, snow, concrete and a rice-alluvial plain. To visually explain differences among the three classification tools, in the Supplementary Materials we provide some TCI reference images, and the corresponding fields classified by the AgroShadow, Sen2Cor and MAJA tools.
As recently highlighted by [25,33], our findings confirm a poor performance of Sen2Cor in identifying cloudy/shadowed observations, with a high rate of underestimations and the highest number of missed scenes (i.e., 11 wrongly identified as clear scenes, Table 3). Likewise, our analysis shows that the multi-temporal cloud mask enables MAJA to perform better than Sen2Cor, but with high commission errors (shadow overestimation). Considering the aim of our study for precision agriculture applications, the risk is including images erroneously classified as no shadow/clear sky (Sen2Cor) or skipping many containing usable information (MAJA). AgroShadow has the added value of reducing both the high omission errors that characterize Sen2Cor and the high commission errors of MAJA, in a sort of reduction in the weaknesses of the two state-of-the-art tools, while preserving and improving their strengths. This result also prevents the computational effort required by the implementation of multiple algorithms in a sort of ensemble tool, as suggested by [33] for cloud detection, by [37,38] for different remote sensing applications, or by [39] that integrate spectral, temporal and spatial information in a three-step cloud/shadow detection. In addition, the AgroShadow tool classifies shadows without clouds' location, only being based on the threshold method, thus avoiding propagation errors due to cloud misclassification. It should be noted that we chose to evaluate the algorithms in areas that show a high interest in terms of agricultural activity without any limitation or preference in terms of disagreement between the three tools and "visual truth". Furthermore, the pool of selected study areas includes combinations of different land use, simple or more complex orography and proximity to rivers or the sea, clear sky, any kind of clouds and shadows (shape and dimensions) which make this comparison as complete as possible for correct use and replicability over a broad range of scenarios.
This study contains only a limitation with regard to the modified-OPTRAM soil moisture index. Indeed, this model may require calibration for areas with climatic and morphologic conditions that differ from those used for its implementation (i.e., Mediterranean environment, flat-hills and plateau areas, rainfed and irrigated crops) [35].

Conclusions
Current methods for the classification of cloud shadow rely on geometry-identification methods by thresholding of the single spectral band, reflectance differences or ratios, or derived indices with a single or multiple-date approach and have shown obvious deficiencies in terms of shadow identification, especially at finer scales.
To eliminate such deficiencies, this paper introduces a new tool-AgroShadowbased on two thresholds (B2/B11 and RGB), a model for soil moisture retrieval and its classification capable of handling and identifying any kind of shadows dimension, orientation and shape. Comprehensive tests demonstrate the full capacity of the proposed tool in dealing with different types of scenarios, such as land use, orography, soil and crop conditions with a substantial benefit in Precision Agriculture applications. The results of approximately 0.8 in terms of F-score and error of 0.054 indicate the superior capability to classify shadows of the AgroShadow tool, while the precision, recall, specificity, and false positive rates are always similar to the best scores obtained with Sen2Cor and MAJA.
AgroShadow is a simplified tool able to create ready-to-use Sentinel-2 data and can be easily integrated in any image processing chain, thus facilitating interoperability.
However, the proposed method is strictly linked to the OPTRAM model for the soil moisture estimations: it is essential to achieve a correct OPTRAM calibration to avoid shadow misclassification. Furthermore, our tool may fail over bare soil flooded areas surrounded by dry bare soils.
Our planned future work will consist of overcoming this issue and testing AgroShadow in environments with climate and soil conditions different from Mediterranean ones.