1. Introduction
Agriculture is the largest consumer of fresh water at the global scale. In 2008, Wisser et al. [
1] reported that 70% of the surface and ground water resources were used for agricultural purposes. While this figure was confirmed in more recent studies [
2,
3] the number is set to increase further due to global population growth and an increasing demand for food and biomass [
4,
5]. At the same time, an intensified competition for the allocation of water resources due to anthropogenic climate change, increased water withdrawal from industry and urban consumers, and the uninterrupted growth of urban areas is projected [
6]. Agricultural water demand is mainly driven by irrigation [
5]. Here, we define irrigation as the temporary or continuous supply of water to crops. Such water supply can either compensate for fluctuations in precipitation and thus reduce inter-annual yield variations [
7,
8] or increase yields in arid areas where crop growth is hardly possible without irrigation [
9]. Irrigation is therefore considered a central asset for closing yield gaps and expanding agricultural activities in space and time [
10].
While there are manifold irrigation techniques, irrigation center pivot systems (CPSs) can be found in many parts of the world, including Northern America [
11], Saudi-Arabia [
12], South Africa [
13], China [
14], and Brazil [
15,
16]. In Europe, CPSs are mainly applied in Mediterranean countries, including Spain, France, and Italy [
17]. CPSs are characterized by an overhead sprinkler mounted on a moveable arm that allows for irrigating crops in a circular pattern. Low costs of acquisition and maintenance, as well as the high degree of flexibility regarding sector-wise variable irrigation rates, have made CPSs particularly attractive [
18,
19]. The latter is also important regarding improved water-use efficiency by means of site-specific farming measures [
20].
Due to their circular or circle-like shapes, CPSs can be mapped by using high-resolution optical remotely sensed imagery (spatial resolution smaller than or equal to 30 m). A prominent example is the so-called “Nebraska Centre-Pivot Inventory” [
21], which covers more than 50,000 CPSs in the state of Nebraska, USA, in its latest version (2005). The CPSs were mapped by using Landsat-TM and aerial imagery by human experts. With the advent of freely available optical remotely sensed imagery (e.g., the Sentinel-2 mission by the European Space Agency) and advances in the power of graphic card processing units (GPUs), deep learning (DL) models have gained more attention for detecting and mapping CPSs in an automated and reasonable fast manner: Zhang et al. [
22] applied three different convolutional neural networks (CNN) to Landsat-5 TM data covering 20,000 km
2 in Colorado, USA, using the three TM channels in the visible part of the electro-magnetic spectrum. The authors reported successful detection of CPSs, with precision (i.e., compliance of class assignments for positive labels) of about 95.85% and recall (i.e., effectiveness of a classifier in identifying positive labels) of about 93.33%. Their approach, however, was restricted to the detection of CPSs and did not allow for mapping the exact size and shape of the CPS. This limitation was overcome by a more recent approach by Saraiva et al. [
16], who used a semantic segmentation algorithm—namely U-NET—trained on high-resolution (3.0 to 3.7 m) planet imagery on a study area in Brazil. U-NET [
23] is a state-of-the-art DL build upon the “fully convolutional network” proposed by Shelhamer et al. [
24].U-NET has a down- and an upsampling branch of convolutional layers to preserve spatial context. In detail, the spatial (i.e., contextual) and spectral information is passed from the down- to the upsampling branch in the form of feature maps that describe the activation of the individual neural layers. Thus, the localization information can be passed on from input to output, and spatial relationships can be explicitly considered. For instance, using the three visible and the near-infrared bands, Saraiva et al. [
16] archived a remarkable precision of 99% and a recall of 88% for center pivot segmentation, making U-NET a promising tool for mapping CPS.
While these studies highlight the potentials of DL-based approaches and particularly U-NET for mapping CPS, some open points remain: Firstly, given that U-NET shall one day replace human experts, it has to be capable to deal with contrasting geographic locations and provide constantly high segmentation and classification accuracy. Research on the geographic transferability of DL approaches, however, is still at its infancy. In a comprehensive review on deep learning methods in remote sensing, Zhu et al. [
25] conclude that the transferability of models trained on geographically limited datasets to the global scale is an ongoing challenge. Since CPSs can be found in many parts of the world, the question arises as to how well a model trained on a single study area performs when transferred to geographically contrasting areas. Arguably, this aspect is crucial for the applicability of U-NET to wider areas and will be decisive when attempting to replace human experts. Secondly, most deep learning implementations in remote sensing only use a small amount of the available spectral information (see, e.g., References [
26,
27]). The visible bands of remotely sensed imagery obtained from sensors like Sentinel-2 MSI are spectrally highly correlated [
28] and only provide limited information about vegetation characteristics but are often used in DL studies. Since the usage of a higher number of spectral bands would significantly increase computational requirements, we investigated the effects of using the wide-spread principal component analysis (PCA) approach for spectral dimensionality reduction [
29], on U-NET CPS classification and segmentation accuracy.
Consequently, the objectives of this work were twofold: Firstly, we compared the classification and segmentation accuracy of U-NET trained on four spectral Sentinel-2 bands to U-NET trained on the first three principal components of Sentinel-2, using a CPS dataset from Northern Texas, USA. Secondly, we assessed how well these two models performed when transferred to geographically contrasting areas: South Africa and the Duero basin in Spain. Based on these objectives we aligned the structure of the paper. Following a description of the study areas and the CPS datasets, we explain the processing of the Sentinel-2 imagery and outline the training data generation and model training process in
Section 2. In
Section 3, we present the results of the semantic segmentation approach and compare the performance of the two U-NET implementations and assess their geographic transferability, which is discussed in
Section 4.
3. Results
3.1. U-NET Training
The results of training the two models over a total of 150 epochs on the Texas study area are shown in
Figure 3. The orange line represents the average training loss per epoch, and the blue line denotes the value of the CE cost function applied to the testing data after each epoch. U-NET PCA (
Figure 3a) shows a clear convergence of training and testing loss values in the last two thirds of the training time. The testing loss fluctuates around 0.3, while the training loss approaches the 0.1 line. For U-NET SPECS (
Figure 3b), a similar picture for the training loss, but the testing loss is significantly higher from epoch 50 onwards, and increases significantly at epoch 110. The training of U-NET-PCA took about 1779 min (1 d, 5 h, 39 min, and 30 s) and was therefore faster than U-NET-SPECS, which after 2052 min (1 d, 10 h, 12 min, and 36 s) had passed through all 150 epochs.
Based on the value of the training loss, the network weights for the final U-NET were determined. In the case of U-NET PCA, a global minimum training loss was reached after 83 epochs, so the network weights were used as they were after this epoch. In the case of U-NET SPECS, this minimum was reached after 45 epochs.
3.2. Pixel-Based Error Metrics
The pixel-based metrics (see
Table 5) are listed in
Table 6 for both U-NET models and all three study areas. The better performing model per study area is marked in green; red indicates worse performance. Orange cells highlight that both models achieve the same score. Both U-NET models showed the highest values for precision, recall, f1-score, and AUC in Texas, where the model training was conducted (see
Section 3.1). The results of the other two study areas used for assessing the geographical transferability of the approach indicate a lower classification accuracy of the two models, with the Duero study area clearly revealing the lowest values in relation to the f1-score (0.08 for U-NET PCA and 0.16 for U-NET SPECS). The South African study area occupies a medium position, in relative terms. The transfer to other geographical areas thus shows a decrease in classification quality, but with differences between the models.
In detail, the predictions made by the U-NET models in Texas are of high precision for U-NET PCA (0.91) and U-NET SPECS (0.85). The recall for U-NET SPECS is also high, at 0.89, but lower for U-NET PCA (0.76). Accordingly, U-NET SPECS has the higher f1-score (0.87 to 0.83). The AUC value is also slightly higher for U-NET SPECS (0.88 to 0.84). The same applies to the accuracy score (0.88 to 0.83).
In the Duero Study area, U-NET SPECS also has a higher accuracy score than U-NET PCA (0.94 to 0.64, respectively). The precision score is very low for both models (U-NET PCA: 0.04, U-NET SPECS: 0.16). While the recall is also very low for U-NET SPECS (0.17), it is much higher for U-NET PCA (0.50). As a result, the f1-score is low for both models, and it is lowest with U-NET PCA at 0.08 (U-NET SPECS: 0.16). As the AUC value of 0.57 for both models indicates, the U-NET models performed only slightly better than a random classifier.
In the South African study area, where the overall model performance was higher than in the Duero area, but lower than in Texas, all metrics indicate a higher performance of U-NET SPECS. U-NET SPECS has the higher accuracy score (0.73 to 0.57), precision (0.77 to 0.58), recall (0.61 to 0.35), and consequently f1-score (0.68 to 0.43). The same applies to the AUC (0.72 to 0.56). The AUC value of 0.56 in the case of U-NET PCA also represents the lowest value among all three study areas.
3.3. Segmentation Results
In addition to the pixel-based metrics, the segmentation quality was determined by visual comparison with the CPS reference geometries. Segmentation quality refers to the capacity of the model to reproduce the manually mapped center pivot geometries. As with the quantitative evaluation of the classification quality (see
Section 3.2.), the qualitative, visual examination of the results shows a decrease in the segmentation quality from Texas, over South Africa to the Duero region, where the results revealed an extremely low performance of both models.
3.3.1. Texas Study Area
Figure 4 displays the results of the semantic segmentation for the part of the Texas study area used for validation. The results from U-NET PCA are shown on the left side of
Figure 4 and the results from U-NET SPECS on the right side. U-NET PCA reproduced the smaller CPSs, in particular, with high quality, and it had only occasional omissions. However, the larger CPSs, which are located central in the map, were only fragmentarily mapped by U-NET. In the western part of the area, which has no CPSs, some false positives can be found.
In case of U-NET SPECS, the misclassifications in the western area appear more spatially distributed, i.e., speckle-like and not organized into larger spatial clusters (
Figure 4b). In addition, not all CPSs were reproduced in their circular form, and a few CPSs were not detected by the algorithm. The larger CPSs also reveal classification and segmentation problems (e.g., larger parts of the center pivots were not assigned to the center pivot class by the model), but at least individual segments of the circle were correctly recognized.
3.3.2. Duero Study Area
For a part of the Duero area, the results of the two models are shown in
Figure 5, analogous to
Figure 4. The poor segmentation quality of U-NET PCA is clearly visible in
Figure 5a, which shows pixels assigned as CPSs in large, contiguous areas. This is especially the case in the northern part of the area, but not limited to it. Some of these areas correspond to CPSs in the reference, but the geometries resulting from the segmentation have little in common with the actual CPS geometries.
A completely different picture emerges from the results of U-NET SPECS (
Figure 5b): The number of pixels classified as CPSs is significantly smaller than in U-NET PCA. The areas segmented are spatially more separated from each other, but often do not correspond to the actual CPS geometries. Only a very small part of the CPS (in the north-western part) was successfully segmented.
3.3.3. South Africa Study Area
Finally, the results for the South African study area can be found in
Figure 6, where again, as in the Duero study area (
Figure 5), U-NET PCA tends to misclassify large, connected areas. Only a small part of the reference CPS is reproduced with high segmentation quality; many CPSs remain undetected or are covered by only a few pixels, which take up a very small part of the actual area of the reference geometries (small intersection over union).
U-NET SPECS (
Figure 6b) did not provide completely accurate segmentation and classification results, since neither all CPSs were found nor are all objects completely segmented. Nevertheless, many segments correspond to the reference geometries and the number of large-area misclassifications is lower and less dominant in visual inspection than with U-NET PCA (
Figure 6a).
5. Conclusions
We trained two U-NET models for semantic segmentation of CPSs in Texas and applied the two resulting networks to other geographic areas with CPSs: the Spanish Duero Basin and South Africa. We were able to show that the reduction of the spectral feature space by means of principal component analysis shortens computation time and stabilizes the training process, but does not increase the quality of classification and segmentation. We assume that effective dimensionality reduction should include spatial (i.e., contextual) properties, in addition to spectral attributes. Since algorithms such as U-NET are supposed to increasingly automate manual mapping, we investigated the generalizability of both U-NET models with respect to their geographical transferability. The results clearly showed that geographical invariance is not an inherent property of U-NET and the complexity of land-use patterns should not be neglected. At this point, we cannot make a proposal for a globally applicable model for segmenting CPS, but we have used the Shannon Entropy as an indication of the transferability of a model to other geographical regions. However, this clearly requires further research.
We assume that the difficulties and approaches for further research presented here are not only relevant for the mapping of CPSs from Sentinel-2 data, but also for many other applications of deep learning algorithms in remote sensing.