Irrigation Mapping on Two Contrasted Climatic Contexts Using Sentinel-1 and Sentinel-2 Data

: This study aims to propose an operational approach to map irrigated areas based on the synergy of Sentinel-1 (S1) and Sentinel-2 (S2) data. An application is proposed at two study sites in Europe—in Spain and in Italy—with two climatic contexts (semiarid and humid, respectively), with the objective of proving the essential role of multi-site training for a robust application of the proposed methodologies. Several classiﬁers are proposed to separate irrigated and rainfed areas. They are based on statistical variables from Sentinel-1 and Sentinel-2 time series data at the agricultural ﬁeld scale, as well as on the contrasted behavior between the ﬁeld scale and the 5 km surroundings. The support vector machine (SVM) classiﬁcation approach was tested with different options to evaluate the robustness of the proposed methodologies. The optimal number of metrics found is ﬁve. These metrics illustrate the importance of optical/radar synergy and the consideration of multi-scale spatial information. The highest accuracy of the classiﬁcations, approximately equal to 85%, is based on training dataset with mixed reference ﬁelds from the two study sites. In addition, the accuracy is consistent at the two study sites. These results conﬁrm the potential of the proposed approaches towards the most general use on sites with different climatic and agricultural contexts.

Pervez et al. [22] used Moderate Resolution Imaging Spectroradiometer (MODIS) derived Normalized Difference Vegetation Index (NDVI) combined with statistical data to map irrigated areas in the United States of America. The annual peak of NDVI was the main criterion for separating irrigated and non-irrigated areas. Boken et al. [23] demonstrated the potential of National Oceanic & Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) to estimate irrigated areas using the NDVI and the vegetation health index (VHI) with coefficients of determination (R 2 ) of approximately 0.49 and 0.80, respectively. Gumma et al. [24] proposed an approach combining Landsat and MODIS point data with a decision tree approach. The accuracy of the fuzzy classification for the irrigated classes varied between 67% and 93%. With the arrival of Sentinel-2 data with high spatial and temporal resolutions capable of following the vegetation cycle at field scale, other studies have proposed even more precise mapping techniques [27,28].
In the last thirty years, microwaves have shown a high potential to characterize land surface properties [29][30][31]. The Synthetic Aperture Radar (SAR) technique offers a high spatial resolution estimate of the radar signal adapted to applications at agricultural field scale [32][33][34]. The measured signal is dependent on the radar configurations (frequency, incidence angle, and polarization) and the dielectric and geometric properties of the surface. After numerous demonstration space missions (ERS, ASAR/ENVISAT, RADARSAT, etc.), the arrival of the Sentinel-1 constellation in the context of the Copernicus program has enabled exponential growth in the use of these signals for monitoring soil moisture and the dynamics of the vegetation cover. Many algorithms have thus been proposed for monitoring the soil water content with approaches based on the change detection technique [35,36], machine learning approaches [37,38] or direct inversions of empirical or physical models [39]. Radar signals have also illustrated a strong potential for monitoring vegetation cover using different configurations (mono-polarization signals, multi-polarization measurement ratio [40], interferometric coherence monitoring [41], etc.). In this context, different approaches have also been proposed to map irrigation, using radar information [42][43][44][45][46][47][48]. On this basis, two approaches have been considered. The first is based on surface soil moisture products as this information could be directly linked to irrigation events. Other studies have considered the radar backscattering coefficient (σ • ) and the use of complementary geophysical parameters. For instance, Gao et al. [42] analyzed different metrics including temporal means, variances, and correlation lengths of radar backscattering coefficient to separate irrigated from non-irrigated areas. Bousbih et al. [39] proposed a similar approach based on soil moisture products extracted from the radar data series. Pageot et al. [43] proposed a combination of different data (radar, optical, and digital elevation model) to map in-season irrigated crops. Classifications of the irrigated crops (kappa = 0.89) are improved compared to those obtained using each type of datum separately (kappa = 0.84). The radar observations allow for taking into account discrepancies on canopy development (speed and amplitude) between various crops and practices and then high improvement in the classification of irrigated fields. Bazzi et al. [44][45][46], in their development of operational tools for irrigation mapping, analyzed the contrasted radar response between the plot scale and the surroundings on a 100 km 2 square. The difference in behavior was essentially related to the presence or absence of irrigation. The vast majority of these proposed approaches are based on training with in situ data and classification primarily using Random Forest [49] or SVM [50].
The choice of metrics with optical or radar data is still generally empirical for each study and has not been explored enough. Often, studies have been concerned with only one test site. In this study, we propose an analysis using metrics based on S1 and S2 data and with an application at two study sites in two contrasted climate contexts to better analyze the robustness of our developments. The objective is not only to highlight this potential for synergy between optical and radar data but above all to prove the contribution of having two sites in the classifier training phase to allow wider use of the proposed tools.
Section 2 describes the two study sites and the database used. Section 3 describes the proposed methodologies to map irrigated fields. Section 4 provides the results and discussion. Section 5 summarizes the conclusions.

(a) Urgell, Spain
The study area, which covers an area of 20 km by 20 km, is located in Urgell, Catalonia, Spain ( Figure 1). The climate is in the Köppen-Geiger zone, a cold semi-arid climate bordering the Mediterranean Sea. It has a mild winter, very dry and warm summers, and two rain seasons in autumn and spring. SIGPAC (Sistema de Información Geográfica de Parcelas Agrícolas) is a system for land parcel identification managed by the Spanish Ministry of Agriculture. This sort of "cadastre" is focused on agricultural management purposes. Every year, the farmers declare all their agricultural parcels. Administrative cross-checking is performed on 100% of declarations to verify that all of them are included in the system, and 5% are inspected to verify the crops. Concerning Catalonia, the system is accessible through http://sig.gencat. cat/visors/Agricultura.html (accessed on 20 January 2022). The irrigation status is one of the fields in the datasets. A plot is considered irrigable only if it has been certified by local water authorities. As such, the status of irrigation is mostly static from year to year.
In this study, the SIGPAC data (plot delineation, land usage, and the presence or absence of irrigation) are used as ground truth to validate our classification. The study area is irrigated using the water coming from the Pyrenees through the Urgell canal. Three different irrigation methods can be identified: (i) flooding, which is still dominant in this region and is used for all kinds of crops, including fruit trees (apple and pear orchards) and cereals; (ii) sprinkler irrigation (pivot, ramp, and integral), which is mainly used for alfalfa and cereal crops (maize, barley, and wheat); and (iii) localized irrigation (mainly drip irrigation), which is mainly used for vegetables. The SIGPAC land cover map in our study area is shown in Figure 1. It is mainly composed of agricultural plots with annual crops (wheat, corn, and alfalfa), fruit trees (olive trees or vineyards . . . ). The natural vegetation and particularly the forests are mainly located in mountainous areas. The dataset consists of 65,306 irrigated plots and 65,307 non-irrigated plots.

(b) Emilia Romagna Region, Italy
The Italian study area is located in the Po River Valley (PRV), one of the most productive agricultural areas in northern Italy. Almost the entire PRV is equipped for irrigation [51], especially in recent years. Irrigation has played a crucial role in crop development due to seasonal drought events that have strongly impacted the agriculture sector. More specifically, the test area ( Figure 2) is located in the Emilia Romagna Region and covers an area of about 850 km 2 . According to the Köppen-Geiger classification, the climatic zone can be classified as "Cfa", which refers to a temperate climate with hot summers and without a dry season. The test site and, more generally, the PRV, is characterized by a smaller size of plots as compared to other European regions, such as the test site in Urgell (Catalonia), Spain. As reported in Massari et al. [19], the size and spatial structure of irrigated fields and districts are strongly related to historical reasons and technical limitations due to the ancient Roman centuriation. This specific feature is also reflected in a significant variability in crop types. Information on the main crops growing on the selected test site was collected from the project for crop classification through remote sensing (iColt, https://sites.google.com/ arpae.it/servizio-climatico-icolt/home?authuser=0 (accessed on 20 January 2022)) led by the Regional Agency for Prevention, Environment, and Energy of Emilia-Romagna (Arpae). The iColt map for the test site is reported in Figure 2, showing the variety of crop types cultivated at the plot scale for the year 2018. Generic summer crops and vineyards, together with meadows, account for the most representative crop types. In this study, the iColt dataset was used as ground truth for validating the proposed classification method. The data are in the first step processed using the Toolbox-Sentinel-1 by considering four steps: elimination of thermal noise, radiometric calibration, correction of the effect of the terrain using a digital terrain model SRTM at 30 m resolution, and finally the application of the adaptative Lee filter [52] to allow for a reduction in speckle. This calibration makes it possible to convert the raw acquired data into backscattering coefficients.

Satellite Database
The optical images were acquired by Sentinel-2A, launched on 23 June 2015, and Sentinel-2B, launched on 7 March 2017. The two satellites allow data acquisition with a repeatability of 5 days and a spatial resolution of 10 m × 10 m. For the applications proposed in this study, orthorectified Level 2A surface reflection images are considered. Pre-processing of data to correct atmospheric effects and mask cloud cover was performed using Maja software [53]. The data were downloaded from the Theia site (https://www.theia-land.fr (accessed on 20 January 2022). The NDVI, expressed as NDVI = (RNIR − RRED)/(RNIR + RRED), where RNIR (band 8 with wavelength equal to 842 nm) is near-infrared reflectance (NIR) and RRED (band 4 with wavelength equal to 665 nm) is red reflectance, was calculated from these images [54]. From the NDVI values for each pixel, a spatial average was estimated for each reference field and each acquisition date to reconstruct the temporal dynamics of the vegetation.

The Experimental Setup
The experimental setup is composed of different steps. The first one is the data preprocessing of satellite images. Each agricultural plot is represented as a single feature that has several properties such as surface, land use, etc. Using the center of each feature, a rectangular zone is defined to represent a 5 km × 5 km area that surrounds the original plot. Secondly, we used metrics of backscatter radar signal and NDVI time series at field scale and over a buffer of 5 km radius. Finally, irrigation mapping was performed using the metrics of each plot, the classification function, and field boundary information. The overall workflow using the support vector machine (SVM) is shown in Figure 3.

Analyzed Metrics
To build the classification method, we first selected approximately 2000 fields for each site to perform learning and modeling. The area of the selected plots generally varied from 3 ha to 25 ha. An analysis of the backscatter and NDVI time-series statistics was carried out for the selected areas, including irrigated croplands and non-irrigated croplands. The radar signal is sensitive to the presence or not of irrigation. Indeed, the scattered signal is directly sensitive to soil water content: it increases with the increase in this parameter which changes with irrigation. Similarly, the radar signal is highly sensitive to the density of the vegetation cover, with particularly high sensitivity of the VH polarization signal to the volume component of the vegetation.
Analysis of the temporal-related metrics was the first step in the analysis of the possibility that the two types of classes can be separated. The analyzed metrics included the mean value of VV, VH, and VH/VV as well as the temporal variance of the two signals, VV and VH. The spatial resolution was also considered. In fact, we considered the ratio between measures at the plot scale and the 5 km spatial scale, which was considered approximately sensitive to precipitation without irrigation effects [32][33][34]. Vegetation was also analyzed with the introduction of four metrics, the mean and variation of the NDVI, considering the two spatial scales as well. The NDVI was generally higher on the irrigated fields. Table 1 summarizes all tested metrics. The metrics in Table 1 were considered for the application of the object-based classification approach. The statistics were thus recorded at field scale.

Support Vector Machine
The mapping of irrigated areas is based on different classification methods habitually used for land use mapping [55][56][57][58]. For each class label, in addition to the selected feature vector, a set of data samples are associated. Thus, the parameters associated with each classifier are estimated from this set of training data. In this study, we adopted the point classification approach used with the SVM. SVMs were selected because a simple feature vector can be used as input for the discrimination task. Support vector machine (SVM, also known as support vector network) is one of the most applied methods for remote sensing image classification studies. It is a question of finding with the classifier a hyperplane (a decision boundary), making it possible to optimally separate two sets of data and also to maximize the margin between these two sets. The calculation of the minimum distance between the two samples closest to the hyperplane makes it possible to estimate this margin. In the context where the data are not linearly separable, it is possible to go to more than two classes in the structuring of the training data. The SVM can thus be extended to multiclass classification by practicing a decomposition into a predefined series of binary problems. SVM can also analyze data that are not linearly separable. In this case, a kernel function [59], a nonlinear transformation is applied to the input data. Among the various kernel functions, Gaussian and Radial Basis Function (RBF) kernels are the most popular in remote sensing studies. In this study, we chose to consider a Gaussian Radial Basis Function (SVM-RBF). The appropriate choice of a kernel function is essential to enable a strong ability of SVM to solve the most complex problems.

Accuracy Parameters
The purpose of this section is to propose an evaluation of the classifier tested in this study. The database is divided into two sets: the first one for training and the second one for the validation of the proposed algorithms. The confusion matrix is represented by a square structure, where the columns are the predicted values, and the rows represent the reference data. The diagonal of the matrix corresponds to the number of pixels correctly classified. All matrix elements outside the diagonal correspond to errors in classifications.
We considered two indicators [60] to evaluate the quality of the classifiers: -The overall accuracy (OA) is the ratio of well-classified pixels to the total number of validating pixels. This index is equal to zero 0 if no pixel is correctly classified and is equal to 1 if all pixels are correctly classified. - The kappa coefficient (Kc) is a measure of precision, which considers positive results that occur at random. The value of this coefficient illustrates the quality of the classification. Thus, the value is between −1 and 1, where a value of −1 corresponds to a very bad classification and the value 1 corresponds to an excellent classification. Despite the very popular use of this coefficient, it is essential to remain vigilant in interpreting its values [61].

Reduction in the Metrics Set
All possible combinations of the metrics identified in Table 1 are tested to find the optimal combination for the separation between the two irrigated and non-irrigated classes. This analysis is based on the two criteria of OA and Kc. Thus, we conducted tests with a single metric (such as µ (VV_field) or µ (VH/VV_field)), then all the possible combinations with two metrics ((µ (VV_field), µ (VH/VV_field)) or ((µ (VV_field), µ (NDVI_field)), etc.), or with three, four, five, six metrics, etc. From these tests, the lowest precision was retrieved with a single metric, and then an improvement in this precision with the increase in the number of metrics was observed. The maximum of accuracy is reached with five metrics. Above five metrics, we observe a continuous slight decrease. This behavior could be explained by the strong redundancy of information to describe the same surface properties, which could generate additional noise in the training phase. Finally, analysis concludes that an optimum of 5 metrics (µ (VV_field), µ (VH/VV_field), µ (NDVI_field), µ (VV_5 km)/µ (VV_field), and Var(VV_field)) gives the best accuracy equal to 86%. It is a combination of parameters from radar and optical time series that provides the highest contribution to irrigation mapping. These five metrics are considered later in the discussion of the different training and validation schemes at our two study sites.

Classification Validation
Three training schemes are tested to evaluate the robustness of the proposed approach. The first is based on training and validation at the same test site; the second is testing a training scheme at the first site and validation at the second site; and finally, the third is testing with a mixed training base for validation of both tests. The objective of this exercise is to analyze the applicability of the proposed approach and to go as far as possible toward an automatic global application of the proposed algorithm. Table 2 displays the results obtained for the different schemes. For the first scheme, we have accuracy results on the order of 90% for the Spanish site and 88% for the Italian site. We see a better result in the Spanish site, which is obviously explained by a semiarid context where it is simpler to separate irrigated and non-irrigated areas. In fact, the low rainfall in dry periods, especially in summer, shows marked differences in terms of soil moisture and plant cover development. In a wetter context such as northern Italy, these differences are less marked, with a generally well-developed canopy and moisture levels that remain relatively high except in the context of exceptional drought. The second scheme concerns the validity test of a training process carried out at one site and a validation at a second site. Thus, a training process was carried out at the Spanish site and validated at the Italian site, and conversely, a training process was carried out at the Italian site and validated at the Spanish site. In both cases, we observe a high decrease in accuracy compared with training at the same site. This is explained by a training context that does not match the context of the site in terms of soil moisture evolution and canopy dynamics.
Finally, the last scheme concerns the training based on test fields from the two study sites (half plots at the Spanish site and half plots at the Italian site). The validation displays an accuracy on the order of 87% at the Spanish site and 84% at the Italian site. This third scheme thus shows accuracy close to the maximum reached with training and validation at the same study site. These results indicate the need for multisite training to ensure the generalization of strong details of the separations between irrigated and non-irrigated areas. Indeed, as was specified before, the nature of the irrigation systems and the climatic contexts varies from one site to another. Thus, this implies enriching the training contexts to ensure the most similarities with local contexts. Figure 4 illustrates the maps of irrigated and non-irrigated fields at the two study sites using training datasets from both sites. The case of the Urgell site (Spain) illustrates a clear concentration of irrigated plots in approximately the same area. This analysis does not take into account a separation between the different types of irrigation, which may have important effects on the application of the irrigation algorithm.   Figure 5 illustrates the contributions of the five parameters considered for the classification according to the considered methods and sites. Three options are considered: the first with training and validation over the Spanish site (Spain−Spain), the second with training and validation over the Italian site (Italy−Italy), and the last with training and validation over the two sites (Spain−Italy). We observe that for training at the Spanish site, four parameters-µ (VV_field), µ (VH/VV_field), µ (NDVI_field), and µ (VV_5 km)/µ (VV_field)-illustrate close contributions. These results illustrate a significant contribution from the µ (VV_field) parameter strongly related to soil moisture, a significant contribution from the µ (VH/VV_field) parameter strongly related to the dynamics of the vegetation cover, a strong contribution from the µ (NDVI_field) parameter related to the dynamics of the canopy, and finally to an equally important contribution from the µ (VV_5 km)/µ (VV_field) report, linked to the effect of irrigation at the plot scale compared with a larger scale dependent on precipitation events. At the Italian site, the contributions from the parameters are not the same, and we can clearly see the highest contribution coming from the µ (VH/VV_field) parameter and thus a role linked to the dynamics of the vegetation cover. The effects of soil moisture that can appear in the two parameters, µ (VV_field) and µ (VV_5 km)/µ (VV_field), seem weaker. This is particularly linked to the moisture context at the Italian site, where the effects linked to soil moisture are less pronounced or detected than in the case of semiarid areas. Otherwise, in a context with a frequent presence of cloud cover, the µ (NDVI_field) parameter also lowers its contribution. In the context of mixed training, the effect of the µ (VH/VV_field) parameter remains predominant but has significant contributions from the parameters µ (VV_field), µ (NDVI_field), and µ (VV_5 km)/µ (VV_field).

Discussion
The accuracy achieved by considering mixed training between two sites is close to what could be found with other single-site studies. Bazzi et al. [44] applied a neural network and a random forest classifier using Sentinel-1 and Sentinel-2 data over the Urgell region. The best performance was obtained with the NN (OA = 94%, Kappa = 87) compared to RF (OA = 92%, Kapa = 88). Other authors reached similar performances with the combination of optical and microwave imagery ( [27,43] in south-west France, [42] in Spain, [62] in India). Interestingly enough, Demarez et al. [27] showed that radar imagery improved the detection of irrigated areas in the early irrigation season. However, this article allows for consolidation of the robustness of a machine learning type approach which can give very good results on one site but without precise applicability on others with different climatic contexts. The use also of VH/VV information seems to be very contributive to the quality of these results. This confirms its sensitivity to vegetation properties retrieved in other studies [40]. This information describing vegetation growth could contribute to the general improvement of the classifications of irrigated areas in regions with high cloud cover. Indeed, the approaches proposed with simple interpolations of optical data could have strong limitations in the context of long durations of meteorological disturbances, as is the case in tropical areas or certain humid regions in Europe. The classification proposed in this study is made without distinction between types of land use or types of irrigation. These are surely factors in the degradation of the precision of the classifications. Indeed, radar signals (VV, VH/VV configurations) can act differently between crop types in terms of canopy attenuation effects or volume scattering contribution. This is particularly visible between cereals with strong attenuation of the radar signal linked to the vertical geometric structure of plants. Conversely, maize displays a much greater volume scattering [7]. Regarding irrigation, various studies have highlighted the complexity of the scattered radar signal in the context of different types of irrigation (surface, drip, sprinkler). Surface irrigation can produce opposite effects when plots are flooded, with a drop in the signal which can be interpreted as a sharp drop in soil moisture. Drip irrigation generates important soil moisture heterogeneities that make scattered radar signal sensitivity to land surface more complex to model [63]. Thus, any finer analysis taking these factors into account would be very useful in refining the classification and making it more generalizable. Ancillary data such as soil description, topography, or even statistical data [64] would probably reinforce the ability of the classifier to separate irrigated from non irrigated areas. Last but not least, as irrigated crops are supposed to have both major vegetation cover and colder surface temperature, the conjunct use of thermal infrared observations seems to also serve as evidence [65], except for the handicap that cloud-masked data cannot be time interpolated.

Conclusions
The main objective of this study is to propose a mapping irrigation approach that is close to the operational approach. Sentinel-1 and Sentinel-2 time series are considered at two study sites: the Urgell site in Spain and the Emilia Romagna region in Italy. Several metrics are tested to separate irrigated from non-irrigated areas. The metrics that seem to contribute the most to this classification are µ (VV_field), µ (VH/VV_field), µ (NDVI_field), µ (VV_5 km)/µ (VV_field), and Var(VV_field). The first provides essential information on the water content of the soil. The second provides information on the dynamics of the vegetation cover; the third also provides information on the growth of the vegetation cover; the fourth provides related information on multiscale effects strongly linked to the context of precipitation; and finally, the last parameter provides information on the variability of temporal signals at the field scale, which is notably linked to the frequency of variations in the soil moisture signal. Using the SVM approach, we identified three training schemes, the first based on training at a site with validation at the same site. The second is based on training at one site and validation at another site, and the third is based on multisite training. We observe that this last case makes it possible to illustrate the best result overall, with very strong details at the two sites and an accuracy of approximately 85%. This allows for better extraction of the most efficient metrics at the two sites, especially in regard to two different climatic contexts. The degradation of classifier precision when the validation is performed on a site that is not the same one used for training is demonstrated. In this case, we go to precisions below 65%. This result particularly highlights the need to use as many sites as possible in the training phase to allow the widest possible applicability and to select the most adapted and common metrics for separation between irrigated and non-irrigated fields. In the future, the enrichment of this analysis by other essential parameters, such as land use and the type of irrigation, will be essential to propose a more operational approach.

Conflicts of Interest:
The authors declare no conflict of interest.