Deep Semantic Segmentation of Center Pivot Irrigation Systems from Remotely Sensed Data

The center pivot irrigation system (CPIS) is a modern irrigation technique widely used in precision agriculture due to its high efficiency in water consumption and low labor compared to traditional irrigation methods. The CPIS is a leader in mechanized irrigation in Brazil, with growth forecast for the coming years. Therefore, the mapping of center pivot areas is a strategic factor for the estimation of agricultural production, ensuring food security, water resources management, and environmental conservation. In this regard, digital processing of satellite images is the primary tool allowing regional and continuous monitoring with low costs and agility. However, the automatic detection of CPIS using remote sensing images remains a challenge, and much research has adopted visual interpretation. Although CPIS presents a consistent circular shape in the landscape, these areas can have a high internal variation with different plantations that vary over time, which is difficult with just the spectral behavior. Deep learning using convolutional neural networks (CNNs) is an emerging approach that provokes a revolution in image segmentation, surpassing traditional methods, and achieving higher accuracy and efficiency. This research aimed to evaluate the use of deep semantic segmentation of CPIS from CNN-based algorithms using Landsat-8 surface reflectance images (seven bands). The developed methodology can be subdivided into the following steps: (a) Definition of three study areas with a high concentration of CPIS in Central Brazil; (b) acquisition of Landsat-8 images considering the seasonal variations of the rain and drought periods; (c) definition of CPIS datasets containing Landsat images and ground truth mask of 256×256 pixels; (d) training using three CNN architectures (U-net, Deep ResUnet, and SharpMask); (e) accuracy analysis; and (f) large image reconstruction using six stride values (8, 16, 32, 64, 128, and 256). The three methods achieved state-of-the-art results with a slight prevalence of U-net over Deep ResUnet and SharpMask (0.96, 0.95, and 0.92 Kappa coefficients, respectively). A novelty in this research was the overlapping pixel analysis in the large image reconstruction. Lower stride values had improvements quantified by the Receiver Operating Characteristic curve (ROC curve) and Kappa, and fewer errors in the frame edges were also perceptible. The overlapping images significantly improved the accuracy and reduced the error present in the edges of the classified frames. Additionally, we obtained greater accuracy results during the beginning of the dry season. The present study enabled the establishment of a database of center pivot images and an adequate methodology for mapping the center pivot in central Brazil.


Introduction
Irrigation is one of the leading technologies for increasing agricultural productivity, improving the yield of most crops by 100% to 400% [1]. Besides, irrigation promotes several benefits: Mitigation of the seasonal climatic factor and agricultural risk, agricultural expansion in arid and semi-arid regions, plantation diversity, a higher commercial value of products, reduction of unit production costs, stabilization of production and food prices, and improvement of the socio-economic conditions of farmers.
In recent years, Brazil has shown significant annual growth in the irrigated area mainly in the Cerrado region. The Cerrado biome contains the largest proportion of areas irrigated by center pivots within Brazilian territory, ranging from 85.2% in 1985 to 78.3% in 2017 [2]. The irrigation areas expand to regions with a higher water deficit, requiring attention from water resources management. Regarding some types of irrigation, research has been developed to map the center pivot irrigation system (CPIS), which covers extensive areas. In Brazil, CPIS is the leader among mechanized irrigation, containing an average increase of 85,000 ha per year in the last five years, 1,04,000 ha per year in the previous three years, and has the most significant number of water concessions with 30.1% of the total [3].
Therefore, irrigated agriculture increases food supply regularly throughout the year and ensures food security. However, irrigation is the largest consumer of anthropic water with values well above any other use, reaching 70% of the global annual water withdrawal from watercourses and groundwater [4,5]. Moreover, projections for global agricultural water demand in 2050 may represent the need for a 19% increase in irrigation [5]. Irrigated agriculture also has a considerable impact on the environment, such as erosion, pollution, soil salinization, and lowered groundwater tables, among others. Consequently, the continued population growth represents a challenge to adjust the demand for food production with the management of water resources and the protection of biodiversity [6,7]. Furthermore, the availability of freshwater in the irrigation sector is expected to decrease due to increasing competition with other multiple uses of water. Many surveys approach the problem of overexploitation of freshwater resources and the threat to food security [8][9][10][11]. An aggravating factor for the future scenario is the effect of climate change, which should demand an increase in the use of irrigation to maintain agricultural production [12,13].
Regional monitoring of irrigated areas with the acquisition of accurate information on their extent, spatial pattern, production, and productivity is essential to ensure food security, better water resources management, territorial planning, and economic development [14][15][16]. Davis et al. [17] point out that the reformulation of agricultural landscape configurations based on location and total water consumption would provide higher food production and better water use efficiency. Thus, remote sensing is a tool to monitor and plan spatiotemporal changes in crops, seeking to establish rules to minimize current and potential conflicts over water use. Mapping irrigated areas using remote sensing data has been extensively used since 1970-1980 [18,19]. Different remote sensing data have been applied for the detection of irrigated areas, including optical data [20][21][22][23], radar data [24][25][26][27], or the combined use of the two types of data [28][29][30]. However, most CPIS mappings use the visual interpretation of circular features [3,19,[31][32][33][34]. Center pivots do not always have similar behavior and may contain different plantations, making classification based on the spectral response of pixels or vegetation indices difficult. Therefore, the consistent automatic detection of center pivots from remote sensing data remains a challenge, enabling greater speed and avoiding widespread labor consumption.
In this approach, a method that has great potential for automated detection is the deep semantic segmentation. Semantic segmentation belongs to the field of computer vision, being a high-level task that seeks a complete understanding of the scene, including information of the object category, location, and shape [35,36]. According to Guo et al. [37], there are differences between semantic segmentation and image classification, because they do not need to know in advance what are the concepts of visual objects. Semantic-level segmentation allows all parts of the object to interact more precisely, identifying and grouping the pixels of the image that are semantically together. The aggregation of different parts that make up a whole requires a deep semantic understanding [37].
Several traditional computer vision and machine learning techniques have been overcome by deep semantic segmentation, a method that achieves greater accuracy and efficiency. Deep learning is an emerging approach that belongs to a subfield of machine learning and seeks to learn high-level abstractions in data using hierarchical architectures [38]. Different types of digital image processing using deep learning have obtained relevant results, for example, image fusion, image registration, scene classification, object detection, land use and land cover classification, segmentation, and object-based image analysis [39]. Classifications of remote sensing images using deep learning produced superior results in different types of mapping: Land-use and land-cover classification [40][41][42][43], urban features [44][45][46][47], change detection [48][49][50][51], and cloud detection [52][53][54][55], among others.
In this approach, Zhang et al. [56] was a pioneer in the use of CNNs for automatic identification of CPIS. The research [56] presents the following steps: (a) Collection of Red-Green-Blue (RGB) image training data with a size of 34 × 34 pixels for CPISs and non-CPISs, where each CPIS has 25 images with a small position difference to the central; (b) application of CNNs and identification of the center of each CPIS using a variation-based approach, where the pixel with the lowest variation value within the local area is detected as the central point; and (c) demarcation of CPIS using a fixed-size square in the center. However, the authors did not segment the entire field. Instead, they identified only the central point of CPIS. The square demarcated from the center of the CPIS has a predetermined size and is not necessarily in accordance with the circumference of the CPIS. The survey also did not consider the seasonal variation of the plantations.
Changes in dry and rainy seasons in the Cerrado biome cause a significant variation in the phenology of CPIS agricultural cultivation and the surrounding natural vegetation. Therefore, this research sought to consider these seasonal differences in recognition of CPIS patterns. Another critical issue analyzed is the process of reconstructing the entire image. In sizeable remote sensing images, the segmentation is made by a sliding window with a lateral overlay for later image reconstruction. However, there is a knowledge gap of the effects of different overlapping intervals on reconstructed image quality, which the study sought to analyze. To compare the results with other surveys, we used CNN architectures used in other investigations with satellite images, such as De Bem et al. [48] and Yi et al. [57].
The present research aimed to evaluate deep sematic segmentation techniques for CPIS detection in central Brazil using Landsat 8 images. In this regard, the study assessed the following factors: (a) Different environments in central Brazil and seasonal changes (drought and rain); (b) three models based on CNN architecture (U-net, Deep ResUnet, and SharpMask); and (c) image reconstruction considering different overlapping ranges between 256 × 256 frames.

Materials and Methods
The image processing included the following steps ( Figure 1): (a) Definition of three study areas with a high concentration of CPIS in central Brazil; (b) acquisition of Landsat-8 Operational Land Imager (OLI) images (30-m resolution) considering the seasonal variations of the rain and drought periods; (b) definition of CPIS datasets containing Landsat images and ground truth mask of 256×256 pixels; (c) training stage using three popular CNN architectures (U-net, Deep ResUnet, and SharpMask); (d) large image reconstruction using a sliding window algorithm; (e) analysis of seasonal effects in the detection of CPIS; and (f) accuracy analysis.
In general, object detection is challenging in large remote sensing images, which requires the establishment of reasonable dimensions of the training sample to obtain performance in processing and memory management. The definition of the sample size must consider the characteristics of the object, such as the format, locations, and scales. Thus, a strategy for the classification of a large remote sensing image is to subdivide it into patches with the same size as the training samples and to use a Remote Sens. 2020, 12, 2159 4 of 25 sliding window algorithm with a determined stride (overlap interval between patches). In this context, the present research performs numerous stride length comparisons to identify the optimal parameters to image reconstruction for center-pivot mapping. In addition, the research assesses the effects of phenological variations of natural vegetation and plantations during the rainy and dry period in the CPIS detection process.

Study Area
The study sites cover three regions of central Brazil, presenting a high concentration of center pivots favored by the flat terrain that allows mechanization: (a) Western Bahia (835 center pivots); (b)

Study Area
The study sites cover three regions of central Brazil, presenting a high concentration of center pivots favored by the flat terrain that allows mechanization: (a) Western Bahia (835 center pivots); (b) Goiás/Minas Gerais (2672 center pivots); and (c) Mato Grosso (224 center pivots) ( Figure 2). In these regions, water scarcity between May and September prevents the cultivation of several crops, requiring the need for additional irrigation water.
The Goiás/Minas Gerais region has one of the highest concentrations of center pivots in Brazil, reaching the number of hundreds. In this region, there is a conflict over the use of water between the sectors of irrigated agriculture, human consumption, and hydroelectric power generation. Several types of research have already been carried out in the mapping of center pivot areas, analysis of areas suitable for the expansion of irrigation, demand for water for irrigation, and conflicts arising from competition for multiple water use [64][65][66][67].
The state of Mato Grosso has favorable environmental factors for agriculture, being one of the leading agricultural producers of soy and corn [68][69][70][71]. Besides, Mato Grosso had the most significant center pivot increase in the 2010-2017 period (175% growth), consolidating itself as an essential Brazilian irrigation center that still has considerable expansion potential [72].

Dataset and Training Samples
In deep learning techniques, extensive and qualified datasets are critical for object recognition success and meaningful performance comparisons between different algorithms. Satellite images allow the creation of extensive datasets in space and time that capture the vast richness and diversity The Western Bahia region with flat topography and water availability (from the rainfall, rivers, and groundwater) shows an increasing expansion of mechanized farming that replaced traditional agriculture [2,[58][59][60][61] and an intensification of the implantation of center pivots [62]. Western Bahia had a significant increase in the irrigated area, ranging from 9 center pivots in 1985 to 1550 center pivots in 2016, which has caused water conflicts since 2010 [63].
The Goiás/Minas Gerais region has one of the highest concentrations of center pivots in Brazil, reaching the number of hundreds. In this region, there is a conflict over the use of water between the sectors of irrigated agriculture, human consumption, and hydroelectric power generation. Several types of research have already been carried out in the mapping of center pivot areas, analysis of areas suitable for the expansion of irrigation, demand for water for irrigation, and conflicts arising from competition for multiple water use [64][65][66][67].
The state of Mato Grosso has favorable environmental factors for agriculture, being one of the leading agricultural producers of soy and corn [68][69][70][71]. Besides, Mato Grosso had the most significant center pivot increase in the 2010-2017 period (175% growth), consolidating itself as an essential Brazilian irrigation center that still has considerable expansion potential [72].

Dataset and Training Samples
In deep learning techniques, extensive and qualified datasets are critical for object recognition success and meaningful performance comparisons between different algorithms. Satellite images allow the creation of extensive datasets in space and time that capture the vast richness and diversity of objects present on the land surface, which results in high-performance object recognition. The challenge is to establish a dataset contending the satellite images alongside with the corresponding ground truth image. The present research used data from the "Center Pivots in Brazil Irrigated Agriculture Survey", developed by the National Water Agency (ANA) [3], which contains all the vector data of the center-pivot polygons of the Brazilian territory in 2013/2014. The ANA survey extracted the vector polygons of CPIS from the visual interpretation of Landsat-8 OLI images. The preparation of ground truth images used this ANA database with some minor corrections when necessary.
For data compatibility with the ANA survey, we also used Landsat-8 surface reflectance images [73] from the same year 2014 or 2015 for the training and validation data. In central Brazil, the climate has well-defined rainy and dry seasons, with distinct phenological behaviors [74,75]. This climatic variability is responsible for differences within the same type of vegetation or planting, such as regeneration, vegetative growth, flowering, fruiting, and seed dispersal. Therefore, the image acquisition covered dry and rainy months with the different responses of vegetation and crop. Table 1 lists the set of images used in the three study areas. In the analyzed temporal images, we observed changes in the presence of center pivots in specific locations, even in short periods ( Figure 3). Thus, we checked and corrected the center pivot polygons to elaborate on the ground truth images.
This research considered two classes of interest (center pivots and non-pivots). The dataset had 5000 frames of each 256 × 256 pixel (4200 with center pivots and 800 without center pivots) with an 80%-20% train-test split (4000 frames for training and 1000 for validation). We evaluated three different neural network architectures (Deep ResUnet, U-net, and Sharpmask) with the following hyperparameter configurations: (a) 200 epoch training with callbacks, (b) batch size of 8, (c) Adam optimizer, and (d) dice coefficient as the loss function. Additionally, each model's input layer was adjusted to support seven-channel Landsat images with 256 × 256 dimensions, resulting in a 256 × 256 × 7 input shape. For data processing, we used a computer equipped with a Nvidia GeForce RTX 2080 TI graphic card with 11 GB of GPU memory, 16 GB RAM, and an Intel Core i7-4770K CPU processor with a 3.5 GHz processing speed.

Deep Learning Models
In this present research, we used three deep learning architectures: U-net [76], Deep ResUnet [77], and SharpMask [78]. U-net achieves significant results in the semantic segmentation, because of its ability to preserve essential features in the image, having two main parts: Contraction and expansion [76]. The name U-net comes from the symmetrical trajectory between both model parts (contraction and expansion) that describes a U-shape architecture. Thus, the U-net model has a series of kernels that act as filters that map specific features. The contraction (encoder) stage of the architecture consists of cascade downsampling, which reduces the image size and increasing the number of filters. The expansive (decoder) stage consists of a symmetrical number of up samples, returning the image to its original size, and decreasing the number of filters to the number of outputs. Each downsampling stage has two Conv2D layers, two batch normalization layers, and two ReLu activation functions, ending with the MaxPooling layer. The upsampling stage has the same format, but instead of the MaxPooling layer at the end, there is an upsampling layer at the beginning. There are five downsamples, which means the image gets to 1/32 of its original size, and five upsamples.

Deep Learning Models
In this present research, we used three deep learning architectures: U-net [76], Deep ResUnet [77], and SharpMask [78]. U-net achieves significant results in the semantic segmentation, because of its ability to preserve essential features in the image, having two main parts: Contraction and expansion [76]. The name U-net comes from the symmetrical trajectory between both model parts (contraction and expansion) that describes a U-shape architecture. Thus, the U-net model has a series of kernels that act as filters that map specific features. The contraction (encoder) stage of the architecture consists of cascade downsampling, which reduces the image size and increasing the number of filters. The expansive (decoder) stage consists of a symmetrical number of up samples, returning the image to its original size, and decreasing the number of filters to the number of outputs. Each downsampling stage has two Conv2D layers, two batch normalization layers, and two ReLu activation functions, ending with the MaxPooling layer. The upsampling stage has the same format, but instead of the MaxPooling layer at the end, there is an upsampling layer at the beginning. There are five downsamples, which means the image gets to 1/32 of its original size, and five upsamples. The architecture ends with a sigmoid activation function. U-net has been used for the semantic segmentation of targets in remote sensing images: Road network [79], water body [80], building extraction [46,81], raft aquaculture areas [82], and edge-feature-based perceptual hash [83].
The deep residual U-Net (Deep ResUnet) combines the strengths of deep residual learning and the U-Net architecture [77] (https://github.com/nikhilroxtomar/Deep-Residual-Unet). The main advantages of the model are (a) replacement of plain neural units by residual units as a basic block, and (b) removal of cropping operation, allowing better performance because it is unnecessary. The architecture consists of encoder and decoder blocks. The decoder block has three sets of batch normalization, ReLu activation function, padding, and convolutional block. The encoder block has the same structure, but with strides, so the image is downsampled. The architecture ends with a sigmoid activation function. The Deep ResUnet and its variation have been investigated for satellite image segmentation [77,84,85].
Facebook's SharpMask is a network that enhances the sharpness of segmentation masks to object classification [78], which can be very satisfactory for our case, which deals with geometric objects. The architecture consists of convolutional and refinement blocks composed of three sets of Lambda, Conv2D, batch normalization, and ReLu activation functions. However, the refinement stage also adds activation functions. Every convolutional block is connected to a MaxPooling layer, and every refinement block is connected to an upsampling layer. We performed four convolutional and refinement blocks that connect to a dense layer with 64 neurons and a ReLu activation function, and at the end, the sigmoid activation function. De Bem et al. [48] used SharpMask to detect changes in the Amazon region.

Classified Image Reconstruction for Large Scenes
We developed a sliding window with the same training image dimension that slides over the image for entire scene classification. Window movement can use different stride values in the horizontal and vertical directions. Figure 4 demonstrates the process of classifying large images from a sliding window. In the example, an 8 × 8 window slides over an image with a stride of two pixels. This process generates an overlap between consecutive frames considering stripe dimensions smaller than the window size ( Figure 5). Thus, a set of values may be produced for a pixel that can be used to improve target detection.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 24 The architecture ends with a sigmoid activation function. U-net has been used for the semantic segmentation of targets in remote sensing images: Road network [79], water body [80], building extraction [46,81], raft aquaculture areas [82], and edge-feature-based perceptual hash [83]. The deep residual U-Net (Deep ResUnet) combines the strengths of deep residual learning and the U-Net architecture [77] (https://github.com/nikhilroxtomar/Deep-Residual-Unet). The main advantages of the model are (a) replacement of plain neural units by residual units as a basic block, and (b) removal of cropping operation, allowing better performance because it is unnecessary. The architecture consists of encoder and decoder blocks. The decoder block has three sets of batch normalization, ReLu activation function, padding, and convolutional block. The encoder block has the same structure, but with strides, so the image is downsampled. The architecture ends with a sigmoid activation function. The Deep ResUnet and its variation have been investigated for satellite image segmentation [77,84,85].
Facebook's SharpMask is a network that enhances the sharpness of segmentation masks to object classification [78], which can be very satisfactory for our case, which deals with geometric objects. The architecture consists of convolutional and refinement blocks composed of three sets of Lambda, Conv2D, batch normalization, and ReLu activation functions. However, the refinement stage also adds activation functions. Every convolutional block is connected to a MaxPooling layer, and every refinement block is connected to an upsampling layer. We performed four convolutional and refinement blocks that connect to a dense layer with 64 neurons and a ReLu activation function, and at the end, the sigmoid activation function. De Bem et al. [48] used SharpMask to detect changes in the Amazon region.

Classified Image Reconstruction for Large Scenes
We developed a sliding window with the same training image dimension that slides over the image for entire scene classification. Window movement can use different stride values in the horizontal and vertical directions. Figure 4 demonstrates the process of classifying large images from a sliding window. In the example, an 8 × 8 window slides over an image with a stride of two pixels. This process generates an overlap between consecutive frames considering stripe dimensions smaller than the window size ( Figure 5). Thus, a set of values may be produced for a pixel that can be used to improve target detection.

S W liding indow
Remote Sensing Image Classified Image Figure 4. Classification of large images based on their subdivision into frames. The method uses a sliding window that runs the image with a certain stride. In the example, the classification uses an 8x8 window that slides over an image with a two-pixel step.
The tests conducted in this research considered different stride values between two successive windows. Algorithms to reconstruct large images based on a sliding window with overlapping pixels were applied for remote sensing data. Previous studies used the average values of overlapping pixels to reduce the impact of frame boundaries, which tend to have more errors [48,57]. Instead of using the average, we established a proportionality index of the number of times the pixel was classified as a center pivot. Thus, we increased the pixel counter by one when the result value was greater than 0.7, which means a high probability of having a center pivot. In the end, for each pixel, we had a ratio of the number of times the method identified the pivot divided by the number of overlapping data, restricting the range of values between 0 and 1. The proportionality calculation considers the edge effect in the total image as necessary ( Figure 5). A threshold value defines the center pivot and non-

Season Analysis
The central Brazil region presents a substantial phenological variation throughout the year. In the Cerrado biome, water scarcity is the primary climatic determinant of leaf phenology, establishing the period to produce dry leaves and the sprouting of new leaves. The Cerrado vegetation has herbaceous and arboreal strata. Herbaceous plants lose their leaves in the dry season and produce new leaves at the beginning of the rains. Woody plants have different strategies, in which the brevideciduous and deciduous species completely lose their foliage during the dry period, and the evergreen species keep their leaves throughout the year. Besides, the stages of planting cycles also interfere with the detection of CPIS. Therefore, we chose images with different photosynthetic responses from water stress, as shown in Table 2 and Figure 6. The area analyzed was the Goiás/Minas Gerais region, which has the highest concentration of CPIS, encompassing three Landsat scenes. The image with the highest percentage of photosynthetic vegetation was from May 2019, representing the end of the rainy season ( Figure 6A). In contrast, the image from the critical dry period (August 2019) has a few areas with photosynthetically active vegetation, limited to some CPIS and riparian forest ( Figure 6C). Additionally, we added an image from the beginning of the dry season with intermediate behavior from June 2018 ( Figure 6B). One of the most considerable difficulties in obtaining rainy season imagery is the presence of clouds, especially when analyzing large areas.  The tests conducted in this research considered different stride values between two successive windows. Algorithms to reconstruct large images based on a sliding window with overlapping pixels were applied for remote sensing data. Previous studies used the average values of overlapping pixels to reduce the impact of frame boundaries, which tend to have more errors [48,57]. Instead of using the average, we established a proportionality index of the number of times the pixel was classified as a center pivot. Thus, we increased the pixel counter by one when the result value was greater than 0.7, which means a high probability of having a center pivot. In the end, for each pixel, we had a ratio of the number of times the method identified the pivot divided by the number of overlapping data, restricting the range of values between 0 and 1. The proportionality calculation considers the edge effect in the total image as necessary ( Figure 5). A threshold value defines the center pivot and non-center pivot binary image.

Season Analysis
The central Brazil region presents a substantial phenological variation throughout the year. In the Cerrado biome, water scarcity is the primary climatic determinant of leaf phenology, establishing the period to produce dry leaves and the sprouting of new leaves. The Cerrado vegetation has herbaceous and arboreal strata. Herbaceous plants lose their leaves in the dry season and produce new leaves at the beginning of the rains. Woody plants have different strategies, in which the brevideciduous and deciduous species completely lose their foliage during the dry period, and the evergreen species keep their leaves throughout the year. Besides, the stages of planting cycles also interfere with the detection of CPIS. Therefore, we chose images with different photosynthetic responses from water stress, as shown in Table 2 and Figure 6. The area analyzed was the Goiás/Minas Gerais region, which has the highest concentration of CPIS, encompassing three Landsat scenes. The image with the highest percentage of photosynthetic vegetation was from May 2019, representing the end of the rainy season ( Figure 6A). In contrast, the image from the critical dry period (August 2019) has a few areas with photosynthetically active vegetation, limited to some CPIS and riparian forest ( Figure 6C). Additionally, we added an image from the beginning of the dry season with intermediate behavior from June 2018 ( Figure 6B). One of the most considerable difficulties in obtaining rainy season imagery is the presence of clouds, especially when analyzing large areas. Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 24

Accuracy Assessment
The accuracy analysis is crucial to establish the product quality and to compare classification algorithms. The accuracy assessment for the different methodological approaches adopted 1000 validation samples. We used the metrics commonly used for object detection: Total accuracy, precision, recall, F1, Kappa coefficient, and IoU [86][87][88][89][90]. Table 3 lists the equations for accuracy metrics. Besides, in the evaluation of the image reconstruction with different overlays, we used a new Landsat image (2018) and the ROC-curve analysis. Table 3. Summary of accuracy metrics used in the object detection, where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.

Accuracy Metric Equation
Total Accuracy (TA) Finally, we performed an object-based precision analysis to assess the correctness of the number of center pivots, crucial information for public managers [91,92].

Comparison between CNN architectures from the validation samples
The training stage obtained low values for losses (< 0.05) and high values for Dice coefficients (> 0.99) for all the three methods, which was very satisfactory, demonstrating a high CPIS detection capacity. The CNN architecture efficiency is due to the great diversity of selected samples. This result indicates that all methods had an excellent ability to perform semantic segmentation for center pivots on multispectral data, considering different crops, shapes, and dimensions.
The accuracy scores had a pixel-wise analysis in the validation set (1000 images), totaling a pixel count of 65,536,000 (256×256×1000). The results demonstrated that the U-net had the best performance within the three networks (Table 4, Figure 7). Even though the results were very similar, the residual blocks present in Deep ResUnet did not improve the performance in comparison to U-net, probably because the target has a constant geometric shape, varying only in size. Therefore, this result shows that simpler structures are sufficient for our analysis.

Comparison between CNN architectures from the validation samples
The training stage obtained low values for losses (< 0.05) and high values for Dice coefficients (> 0.99) for all the three methods, which was very satisfactory, demonstrating a high CPIS detection capacity. The CNN architecture efficiency is due to the great diversity of selected samples. This result indicates that all methods had an excellent ability to perform semantic segmentation for center pivots on multispectral data, considering different crops, shapes, and dimensions.
The accuracy scores had a pixel-wise analysis in the validation set (1000 images), totaling a pixel count of 65,536,000 (256×256×1000). The results demonstrated that the U-net had the best performance within the three networks (Table 3, Figure 7). Even though the results were very similar, the residual blocks present in Deep ResUnet did not improve the performance in comparison to U-net, probably because the target has a constant geometric shape, varying only in size. Therefore, this result shows that simpler structures are sufficient for our analysis.

Results of Entire Classified Image in Different Seasons
Segmentation within independent frames tends to have more errors at their edges [57]. Therefore, the image reconstruction from the classified frames with overlapping pixels can minimize these errors. To assess the overlap effect on the result, we selected our best model (U-net) and six different stride values (8,16,32,64,128,256). This procedure used three independent Landsat images with 2560 × 2560-pixel dimensions from the Goiás/Minas Gerais region on 18 June 2018, 20 May 2019, and 24 August 2019. As expected, images with fewer overlapping pixels had a lot of errors at the frame edges, while increasing the number of overlapping pixels resulted in well-marked pivots, significantly minimizing errors.
The image reconstruction from the sliding windows with overlapping pixels had a significant improvement in classification ( Figure 8). The probability image became much closer to the ground truth image, with the stride value decreasing. Another interesting point is the precision of the method when analyzing the variety of spectral behaviors, texture, and internal arrangement within each center pivot. These nuances are complicated even for human recognition, evidencing the importance and precision in the automatic classification of CPIS. Despite all the benefits with stride reduction, a considerable disadvantage of the overlapping windows technique is the longer processing time. Reducing the stride value by half on the x and y axes increased the classification time by four times. Image classification with no overlapping pixels is a fast task while using low stride values is a long process. The classification in a 2560 × 2560-pixel image with no overlapping pixels took about 30 s to complete, while using a stride value of eight took about nine hours. Figure 8 shows the procedures to generate the classified binary image from the probability image.
We obtained the optimal threshold for CPIS detection by testing a succession of threshold values and chose the one with the greater Kappa coefficient when compared to its ground truth image [93]. These successive comparisons generated a graphical overview of Kappa's trajectory with threshold values from 0 to 1 ( Figure 9A1,B1,C1). This quantitative method reduces subjectivity in defining the optimal threshold value. The low optimal thresholds are due to the high selectivity of the index, weighing in favor of high activation pixels, and demonstrating a good likelihood to be a center pivot.
The low threshold value reveals that the index produces a reduction of noisy points in the image, bringing a lower rate of false negatives.     B2, and B3), and binary images with center pivots (red) and non-pivot center (black) (A3, B3, and C3 images).
To evaluate the different stride values in the three distinct dates, we used the receiving operating characteristic curve (ROC) in a pixel-wise analysis presented in Figure 10. The ROC curve is a graphical representation of how well the model can differentiate two classes, by comparing two axes: (a) False positive rate (FPR) and (b) true positive rate (TPR). The closer to 1 in the area under the curve (AUC), the better the model performs. Additionally, the comparison of ROC curves from different periods is an interesting analysis because it shows how well the models can differentiate classes with different inputs. As expected, the areas with more significant photosynthetically active vegetation had better results (rainy and beginning of the dry season), while the critical dry period had weaker results. Stride value reduction increased the AUC scores in all three scenarios, achieving the highest value in the intermediate period (B) (0.984). In the rainy season (A), the results had a similar behavior compared to the intermediate period, with slightly lower values. The critical dry season (C) had the most different response since the reconstruction without overlapping pixels had significantly worse outcomes than the other two periods, but stride reduction significantly increased the ability to differentiate classes in a pixel-wise analysis.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 24 and B3), and binary images with center pivots (red) and non-pivot center (black) (A3, B3, and C3 images). The pixel-wise accuracy analysis presented similar results for the three dates. To make a better differentiation, we performed an object-based accuracy approach for the three Landsat images (2560 × 2560 pixels). This information is vital for public managers who seek to estimate the number of CPIS and evaluate the best scenarios to identify the center pivots. Table 4 lists the confusion matrix of the three dates. We identified: (a) 937 from 974 center pivots at the beginning of the dry season (96% OA); (b) 902 from 974 center pivots at the end of the rainy season (92% OA); and (c) 860 from 974 center pivots at the end of the dry season (88% OA). Even though the pixel-wise analysis had similar results, the object-based analysis shows a great difference within the three periods. Table 4. Confusion matrix containing the number correctly and incorrectly classified targets from the reconstructed image of the three periods using a stride of 8. The pixel-wise accuracy analysis presented similar results for the three dates. To make a better differentiation, we performed an object-based accuracy approach for the three Landsat images (2560 × 2560 pixels). This information is vital for public managers who seek to estimate the number of CPIS and evaluate the best scenarios to identify the center pivots. Table 5 lists the confusion matrix of the three dates. We identified: (a) 937 from 974 center pivots at the beginning of the dry season (96% OA); (b) 902 from 974 center pivots at the end of the rainy season (92% OA); and (c) 860 from 974 center pivots at the end of the dry season (88% OA). Even though the pixel-wise analysis had similar results, the object-based analysis shows a great difference within the three periods. Results from the beginning of the dry season had significantly better results than the rainy and critical dry period. Although the errors encountered in the classification of center pivots are due to their similarity with the surroundings, the source of the error is different. In the rainy season, the vegetation's photosynthetically active areas became very similar to the center pivots that have crop development. In contrast, in the critical dry period, harvesting associated with the conservation tillage practice in reducing runoff and erosion has a similar reflectance with the dry vegetation. Figure 11 shows zoomed areas from the Goiás/Minas Gerais region within the three different dates showing areas that had correct classifications only in the beginning of the dry period. In the rainy season, the red color associated with the photosynthetically active vegetation shows that the center pivot and its surroundings had similar spectral behaviors. Likewise, in the critical dry period, non-photosynthetically active vegetation takes on a white color, homogenizing the CPIS with adjacent areas. Figure 12 shows locations where only the images of the beginning of the dry season detected CPIS. The photosynthetically active vegetation is now gone, and the center pivots have very similar behavior with its dry surrounding. Additionally, this kind of error is much more common than rainy season errors. The present research shows that the identification in the intermediate season is optimal since it has the advantage of photosynthetically active regions inside the pivots, but without the similarity with the vegetation surroundings. Figure 13 shows a rare situation where only the center pivots present in the rainy season image were correctly identified. The pattern of non-recognition was the same as the previous ones, having very similar environments around it. Figure 13B also presents a center pivot that was not identified in any of the periods. Figure 14 shows that most center pivots classified as false positives had significant similarities with the class of interest, being a controversial detection task even to specialized professionals. Figure 14A-C present a possible case of abandoned center pivots, due to the lack of planting area within the circular shape. Figure 14D illustrates a polygon that was erroneously mapped but has a similar center pivot shape. We can observe that even the errors obtained from the predictions are very hard to determine, ensuring state-of-the-art results to this classification problem. Besides, the error images ( Figure 12) also demonstrate an increase in errors along the circumference of the CPIS, being a predictable result, since the manual classification hardly achieves standardization, such as automatic classification. Therefore, this type of effect should not be considered an incorrect classification. 11 shows zoomed areas from the Goiás/Minas Gerais region within the three different dates showing areas that had correct classifications only in the beginning of the dry period. In the rainy season, the red color associated with the photosynthetically active vegetation shows that the center pivot and its surroundings had similar spectral behaviors. Likewise, in the critical dry period, nonphotosynthetically active vegetation takes on a white color, homogenizing the CPIS with adjacent areas.  Figure 12 shows locations where only the images of the beginning of the dry season detected CPIS. The photosynthetically active vegetation is now gone, and the center pivots have very similar behavior with its dry surrounding. Additionally, this kind of error is much more common than rainy season errors. The present research shows that the identification in the intermediate season is optimal since it has the advantage of photosynthetically active regions inside the pivots, but without the similarity with the vegetation surroundings. Figure 13 shows a rare situation where only the center pivots present in the rainy season image were correctly identified. The pattern of non-recognition was the same as the previous ones, having very similar environments around it. Figure 13B also presents a center pivot that was not identified  season errors. The present research shows that the identification in the intermediate season is optimal since it has the advantage of photosynthetically active regions inside the pivots, but without the similarity with the vegetation surroundings. Figure 13 shows a rare situation where only the center pivots present in the rainy season image were correctly identified. The pattern of non-recognition was the same as the previous ones, having very similar environments around it. Figure 13B also presents a center pivot that was not identified in any of the periods.  a similar center pivot shape. We can observe that even the errors obtained from the predictions are very hard to determine, ensuring state-of-the-art results to this classification problem. Besides, the error images ( Figure 12) also demonstrate an increase in errors along the circumference of the CPIS, being a predictable result, since the manual classification hardly achieves standardization, such as automatic classification. Therefore, this type of effect should not be considered an incorrect classification.

Discussion
The present research shows state-of-the-art image segmentation results with high accuracy for CPIS detection in all deep learning models analyzed. This approach has a significant contribution to faster CPIS identification when compared to visual interpretation mapping. The vast majority of CPIS inventories consist of visual interpretation of circular shapes from satellite images. Rundquist et al. [19] systematized 14 years of CPIS inventory in Nebraska. The authors found that dry conditions in Nebraska's state promoted a marked growth of CPIS during the period studied. Schmidt et al. [34]

Discussion
The present research shows state-of-the-art image segmentation results with high accuracy for CPIS detection in all deep learning models analyzed. This approach has a significant contribution to faster CPIS identification when compared to visual interpretation mapping. The vast majority of CPIS inventories consist of visual interpretation of circular shapes from satellite images. Rundquist et al. [19] systematized 14 years of CPIS inventory in Nebraska. The authors found that dry conditions in Nebraska's state promoted a marked growth of CPIS during the period studied. Schmidt et al. [34] carried out the mapping of the CPIS for Brazil's southeastern region in 2002. The research found a total of 4134 CPIS, considering an error greater than 5% due to cloud interference and lack of contrast between the irrigated area and its surroundings. Sano et al. [33] assessed the growth of CPIS in the Federal District of Brazil in the period 1992-2002 to estimate water demand. In the 20 years, the number of center pivots grew from 55 to 104. Ferreira et al. [32] mapped 3781 CPIS in the State of Minas Gerais (Brazil) for the year 2008 using images from the China-Brazil Earth-Resources Satellite 2B / Couple Charged Device (CBERS2B / CCD) satellite. The most significant survey was conducted by the National Water Agency [3], which mapped the entire Brazilian territory in 2004, the data used in this survey.
U-net had slightly better metrics for our target compared to Deep ResUnet, contrasting with other segmentation studies with different targets [48,57,77]. De Bem et al. [48] compared these networks in deforest change detection and obtained better Kappa results for ResUnet (0.94) over U-net (0.91). In urban building detection, Yi et al. [57] also had better Kappa results for Deep ResUnet (0.9176) over U-net (0.8709). Zhang et al. [77] in road extraction analysis used precision-recall breakeven points to evaluate the performance of the models, obtaining closer values between Deep ResUnet (0.9187) and U-net (0.9053). Similarities between Deep ResUnet and U-net results in the present research are probably associated with the trained data. Differences in our data include seven-channel imagery and circular-shaped targets, which can provide simpler structures, showing the similarity between the methods. Even though SharpMask brings the worst accuracy performance, one advantage when compared to the other two networks is the faster training period.
The verified errors occur mostly in different border areas: (a) At the edge of the entire classification due to a smaller amount of overlapping pixels; (b) at the edge of the frames, because the geometric shape of the center pivots only appears partially; and (c) along the circumference of the center pivot, because there are small divergences in the manual labels and the classified image. Previous research in the large image segmentation used the overlapping pixel values from the sliding window to attenuate frame edge errors [48,57]. A methodological novelty was a quantitative analysis by ROC and AUC to analyze the improvement in accuracy with the increase of the overlap area. We also proposed an index for the overlap data, considering the proportion of times the value was greater than 70%. In future research, errors of a semantic nature, such as the classification of abandoned center pivots, can be minimized with the use of a time series due to the ability to detect phenological changes in plantations.

Conclusions
This research focused on the detection of center pivots from three study areas in central Brazil, considering (a) the development of an extensive center pivot database that encompasses different environments in central Brazil and seasonal changes; (b) evaluation of three models based on CNN architecture; and (c) assessment of the procedure for image reconstruction, considering different variations of overlapping ranges. The results achieved state-of-the-art metrics, with the identification of nearly all center pivots. The training and test dataset had 5000 frames that used ground truth information from visual interpretation of the images, which guaranteed quality information and enriched the model's quality. The classification methods using U-net, Deep ResUnet, and SharpMask reached high values for the different accuracy metrics (total accuracy > 0.97, F-Score > 0.93, Recall > 0.90, Precision > 0.96, Kappa > 0.92, and IoU > 0.87). U-net had a slight advantage over Deep ResUnet. A significant contribution of this research was the image reconstruction proposition for large images, considering different stride values for the moving window, allowing several classified image overlays and a better pivot estimation per pixel. This procedure enables improvements in the final image. The results show that moving windows with little or lower overlapping pixels have significant errors at the edges of the frames, but also we identified a significant tradeoff when considering the execution time: No overlapping pixels is a 30 s task while using a large number of overlapping pixels is a task that takes nearly 9 h. This performance could be improved using better GPU processors. Although we already expected better results with stride reduction, the present research conducted a quantitative analysis of this improvement. Classification using deep semantic segmentation is essential, as it replaces manual labor and increases speed. Another crucial information in this research was the seasonal analysis, which is evidence that the best time to identify the presence of center pivots is at the beginning of the dry season since it shows greater contrast with its surroundings, identifying nearly all center pivots present in the scene. This information has implications for agrarian and water management, energy consumption, and land use planning. Future studies should include the development of specific neural networks and test images of different sizes to see if the frame's training size has an impact on the result.