Spatiotemporal Derivation of Intermittent Ponding in a Maize–Soybean Landscape from Planet Labs CubeSat Images

: Climate change is increasing the frequency and intensity of heavy precipitation in the US Midwest, overwhelming existing tile drainage, and resulting in temporary soil ponding across the landscape. However, lack of direct observations of the dynamics of temporal soil ponding limits our understanding of its impacts on crop growth and biogeochemical cycling. Satellite remote sensing o ﬀ ers a unique opportunity to observe and analyze this dynamic phenomenon at the landscape scale. Here we analyzed a series of red–green–blue (RGB) and near infrared (NIR) remote sensing images from the Planet Labs CubeSat constellation following a period of heavy precipitation in May 2017 to determine the spatiotemporal characteristics of ponding events in the maize–soybean cropland of Champaign County, Illinois USA. We trained Random Forest algorithms for near-daily images to create binary classiﬁcations of surface water versus none, which achieved kappa values around 0.9. We then analyzed the morphology of classiﬁcation results for connected pixels across space and time and found that 2.5% (5180 ha) of this cropland was classiﬁed as water surface at some point during this period. The frequency distribution of areal ponding extent exhibited a log–log relationship; the mean and median areas of ponds were 1231 m 2 and 126 m 2 , respectively, with 26.1% of identiﬁed ponds being at the minimum threshold area of 45 m 2 , and 2.5% of the ponds having an area greater than 104 m 2 (1 ha). Ponds lasted for a mean duration of 2.4 ± 1.7 days, and 2.3% of ponds lasted for more than a week. Our results suggest that transient ponding may be signiﬁcant at the landscape scale and ought to be considered in assessments of crop risk, soil and water conservation, biogeochemistry, and sustainability.


Introduction
Recent developments in remote sensing platforms facilitate high spatiotemporal resolution detection and monitoring of potentially important but transient phenomena occurring on the Earth's surface, filling the gaps where previous platforms have been incapable of providing these data. Greater radiative forcing from greenhouse gases produces higher temperatures, increases evaporation rate, and increases the capacity of the atmosphere to carry water vapor, resulting in an increased number of heavy precipitation events and increased streamflow [1]. In the eastern half of the United States, these extreme events are particularly evident during the spring season [2,3]. This new hydrological regime of intense rainfall events and amplified seasonality is likely to increase the frequency of damaging flood events in the region [4], especially soil ponding. Transient soil ponding, besides being a nuisance to row crop agriculture, may potentially alter the biogeochemical balance of agroecosystems [5][6][7][8].
As the frequency and intensity of precipitation increase in the Corn Belt [9][10][11], ponding frequency and extent are also likely to increase as the climate continues to warm. These ponding events will likely have a greater effect on crop yields and biogeochemical cycles in the maize-bean agroecosystem as they increase in frequency, so detecting and monitoring these events will only become more important. Because ponding occurs repeatedly in the same locations and due to recent advances in precision agriculture [12], these areas may provide opportunities to grow perennial bioenergy crops that are more flood-tolerant than maize and soybean [13] or to develop artificial wetlands [14].
It is not uncommon to observe localized flooding-ponding-in tile-drained fields following heavy precipitation events, either because the drainage tiles are not functioning properly or because the inundation exceeds their capacity to rapidly remove surface water. Temporary ponds are inland surface water bodies that emerge and subside dependent on precipitation, soil, and topography [5]. The combination of typically poorly drained soils and the relatively wet climate of the US Corn Belt requires the installation of subsurface tile drain systems in agricultural fields to grow corn and soybeans [15,16]. Tile drainage helps remove surface water and lowers the water table, resulting in deeper-rooted crop plants that are more resilient to water stress. The Corn Belt states have drained approximately 70% of their pre-nineteenth century wetlands, much of it converted to intensive agriculture nowadays [17]; approximately 37% of total Corn Belt cropland is drained by tile drains or surface drainage [16]. Most of the drainage systems of this region were first installed from 1850 to 1930 with the support of federal and state policies [17]. Tile drains made from perforated plastic pipes, or clay for older installations, are buried 0.6 to 1.2 m deep and 10 to 30 m apart, and discharge into nearby streams or ditches [15]. This underground infrastructure is a critical part of Corn Belt hydrology, but the heterogeneous distribution of soil types, microtopography, and tile drainage challenges the development of comprehensive predictions for how the landscape will respond to climate change.
The general trade-off in remote sensing is between spatial and temporal resolution [18,19], a tradeoff that makes it difficult to use remote sensing to observe small, transient phenomena such as ponding. Platforms such as Landsat and Sentinel-2 have sufficient spatial resolution at 30 m and 10 m, respectively, to detect larger ponds [20,21], but with pass rates exceeding one week for Landsat and 5 days for Sentinel-2, they are unable to monitor the fast-changing dynamics of transient events like soil ponding. MODIS (Moderate Resolution Imaging Spectrometer) and VIIRS (Visible Infrared Imagine Radiometer Suite) have sufficient temporal resolution for daily imaging [22,23], but with their spatial resolutions exceeding 250-m, they can only capture extremely large ponds.
Planet's large constellation of high spatial resolution cube satellites, PlanetScope, overcomes the challenge of obtaining high spatiotemporal resolution imagery. The PlanetScope platform is a constellation of CubeSats in the 3U form factor (10-cm × 10-cm × 30-cm), with a target constellation size of 130 satellites [24]. PlanetScope is a commercial product that typically requires payment for access but is also available for research and education purposes at no charge through an application process. The full constellation will be capable of daily 3-m resolution red-green-blue near infrared (RGB-NIR) imaging for the entire Earth's surface, providing unprecedented spatiotemporal resolution and coverage among remote sensing platforms. Thus, the PlanetsScope constellation is capable of overcoming the spatial and temporal resolution trade-off of typical remote sensing platforms, which is essential for this transient ponding detection application.
The objective of this study was to develop a remote sensing method based on Planet CubeSat data to detect ephemeral dynamics of ponding in an agricultural landscape, and to quantify the spatial extent and duration of ponds after a major precipitation event. We automated ponding detection using Random Forest [25], a robust supervised machine learning algorithm. The ability to measure the size and duration of ephemeral ponds with high accuracy will enable monitoring the alterations to surface hydrology and their rapidly shifting dynamics as the climate system changes.

Identify the Extreme Rainfall Event for Analysis
We analyzed a series of RGB-NIR remote sensing images from the Planet Labs satellite constellation [26] following a period of heavy precipitation in May 2017 to determine the spatiotemporal characteristics of flooding events in the maize-soybean cropland of Champaign County, Illinois. This area is strongly dominated by row crop agriculture featuring an annual rotation of maize and soybean, and is representative of cropping systems in much of the Midwest US. We focused on early spring just prior to crop emergence when the probability of intense precipitation events is greatest and when detecting soil ponding is not obscured by plant cover.
To determine the size and duration of ponding events, we first identified high precipitation periods from the top 90th percentile of five-day rolling sums of daily precipitation for Champaign County in 2017. Precipitation data for Champaign County were obtained from the cli-MATE database [27]. We also constrained the selection to days outside of canopy closure during the maize and bean growing season. For our primary analysis of pond size and duration we selected the weeks following a high precipitation period (125 mm total accumulation/week) ending on 5 May 2017 ( Figure 1).

Preprocessing Planet Images
For each day following this high precipitation period in May 2017, we collected and processed 3-m resolution RGB-NIR imagery of Champaign County from the Planet Labs satellite constellation [26] ( Figure 2a). We had 9 days of usable imagery from 3 May 2017 to 15 May 2017, inclusive. We used Google Earth Engine [28] to mosaic the images, mask out obscurations such as clouds and shadows, and ensure alignment between these intermediate raster products (Table S1). We took the mean of the digital number values for all bands at each pixel, which we used to generate masks with simple cut-offs of high and low threshold values (Table S2). We removed digital artifacts around cloud shadows by adding a 150 m buffer around cloud shadows because the classifier detected some of these artifacts as false positives for the water surface. We masked out land cover not classified as maize, soybean, or water according to the 2017 Cropland Data Layer (CDL) [29]. We resampled the raster data and masks on export from Google Earth Engine so that all images completely matched in projection, extent, scale, and alignment of pixels, using the minimal bounding box containing Champaign County to define the origin and dimensions of raster extents. We included permanent water features in training the classifier to provide additional spectral data associated with water, but we later masked out these features before the spatiotemporal analyses.

Classifying Ponding Areas Using Random Forest
For each mosaicked image we trained Random Forest algorithms separately [25] with surface type-attributed polygons as labeling data to create binary classifications of surface water versus none using the RStoolbox (v. 0.2.6) [30] package in R (v. 3.5.1) (Figure 2b), which is a remote sensing-oriented wrapper for the caret package (v. 6.0.8) [31]. All the ground truth labeling data were generated through visual interpretation of images on each single day. The random forest classifier on a specific day was trained only using the labeling data collected on that day. To validate the surface water classification, mosaicked images were compared to a road survey of ponding. On 5 May 2017, we recorded the GPS locations of ponds visually identified from a road survey during the high precipitation period at the start of May 2017. The survey path length was approximately 33 km in the northwest quadrant of Champaign County, and included 39 recorded points of ponding locations. Water features identified from the road survey were readily distinguishable in the mosaicked rasters through false color imaging using the red, blue, and NIR bands for display, which we used to construct the training data for the classifications.
Using QGIS (v. 3.4.15) [32] we generated ground truth labels through visual interpretation of images on each single day. Two remote sensing experts worked on each image at the same time and doing cross-check with each other. Only those areas identified as ponding by both experts were used as areas of interest to generate labeling data. We randomly generated points in the areas of interest and produced polygons using the points as centroids. For each available day, we repositioned and reshaped the polygons to contain only water or to completely exclude water, and then attributed them with their binary land cover type. We used the classified polygons as the training data for the Random Forest algorithms in a 10-fold cross-validation to predict binary classifications of surface water versus none, reserving 10% of training samples, randomly selected, for each day to independently validate the predictions.
After we generated the classifications, we masked out permanent water features as defined by the CDL. We also compared the classified images with images from a dry period as a negative control to show that long-lasting ponds were not overrepresented by permanent features below the CDL's detection threshold. To produce the dry period classification, we used the same method as the wet period images but retained "water" land cover types from the CDL. We analyzed each classification result for accuracy primarily by calculating the associated kappa value from the independent validation. Kappa is formally described as: where p 0 is the overall accuracy, the total proportion of correctly predicted classifications, and p c is the proportion expected to agree by chance alone [33]. We used a threshold kappa of 0.7; we added more training polygons when a classification fell below the threshold kappa. The 95% confidence intervals (CI) of overall accuracy were estimated using the binomial test. Other performance metrics used in this study include no information rate, sensitivity, and specificity, with their definitions as follows: No in f ormation rate = 'sur f ace water total %, 'not water total % (2) Sensitivity = 'sur f ace water correctly predicted / 'sur f acewater total Speci f icity = 'not water correctly predicted / 'not water total (4)

Spatiotemporal Analysis of Ponding Dynamics
For the morphological analyses of classified rasters from the multiple days following a heavy precipitation event, we used the multidimensional image processing tools from the Scipy package (v. 1.1.0) in Python (v. 3.7.2) [34] (Figures 2c and 3). We stacked the days together sequentially with time along the z-axis (Figure 3a). We removed noise via binary opening with a 5-cell, 45 m 2 structuring element; the structuring element was arranged as a central cell with four directly adjacent cells in the x and y dimensions, so all non-contiguous cells smaller than the structuring element were dropped from the morphological analysis (Figure 3b). We gap-filled missing data, whether due to obscurations or unavailable image extent, by duplicating classification data from the next available day (Figure 3c). We labeled spatially and temporally contiguous cells to define features, each of which was labelled sequentially in order of detection along the x, y, and then z axes (Figure 3d). We described feature duration distribution by analyzing how many days each labeled feature lasted, starting on the first day with precipitation below 0.5 cm, 6 May 2017. We used the gap-filled data from 6 May 2017, to determine the areal distribution of ponds by analyzing the areal extent of each labeled feature. All code used for pre-processing and analysis is available at github.com/rfpaul/Flooding_Laszlo.

Results
For our independent validation, our kappa values across all analyzed days ranged from 0.823 to 0.9996, suggesting that we had excellent accuracy for the classifiers (Table 1). Kappa values tended to be lower on days with more cloud cover or haziness. Kappa and other accuracy measures were high in the latter days of the analysis period, likely due to a reduction in the presence of ambiguous pixels at the trained classifier's decision threshold of water surface versus saturated soil. Occlusions of the land surface from clouds and cloud shadows resulted in missing data and likely led to underestimation of the actual spatiotemporal dynamics of the analyzed region (Figure 4a-e).
The total area of Champaign County is 258,000 ha, of which 214,000 ha (82.9%) is in maize-soybean agriculture according to the 2017 CDL. A total of 208,000 ha (97.2%) of valid data within maize-soybean agriculture was available for our analyses (Figure 4a,b). For all days in the analysis, 5180 ha (2.5%) were classified as water surface at some point. For the gap-filled classification on 6 May 2017, the total area of water surface detected by the classifiers was 1497 ha (0.7%). Ponding was detected throughout the study area and showed high spatial variability (Figure 4d,e). The morphological analysis of the classifications allowed us to quantify the duration and area of ponds after the major precipitation event at the start of May 2017 (Figures 5 and 6). Because we did not have daily coverage of usable imagery for the county, the duration data are discontinuous. (c) Mapping the spectral bands to the false-color display as red to red, near infrared (NIR) to green, and blue to blue allows for the discrimination of water surfaces during the assembly of polygons for training the algorithms. (d) In the classification images, yellow represents valid, available data that were classified as not water and blue represents surfaces classified as water, with the same representation in (e) the gap-filled data. The gap-filled data excludes permanent water bodies identified from the 2017 CDL. Due to the missing data from masking out clouds and cloud shadows, the spatiotemporal analysis is an underestimate of the actual ponding dynamics of this landscape.  . The frequency of pond duration beginning 6 May 2017, the cessation of daily precipitation exceeding 1-cm during the analysis period, indicates that the duration of ponds is bimodal with a large number of short duration ponds and a second, small peak of long-lasting ponds. Frequencies are shown on the y-axis as a base-10 log scale. Gaps in the data occurred because daily satellite coverage was not available, or cloud cover was too great. The jump in the frequency of long-lasting ponds includes a small number of permanent water features below the detection threshold of the Cropland Data Layer. Table 1. Accuracy statistics of classifier per date at the start of May 2017. All data were trained on 10-fold cross-validations benchmarked on kappa values. These statistics are from independent validations performed by reserving 10% of the training polygons, randomly sampled, which were not used in the training or prediction steps. All of our kappa values are greater than 0.8, suggesting that we had excellent performance in the classifiers.

Date
Overall The frequency distribution of pond size classes exhibited a log-log relationship, with small ponds being extremely common and large ponds being rare ( Figure 5). The mean and median areas of ponds were 1231 m 2 and 126 m 2 , respectively. Of all identified ponds, 26.1% equaled the minimum threshold area of 45 m 2 , whereas 378 (2.5%) of the ponds had an area greater than 1 ha and the largest pond was 25.4 ha. Ponds lasted for a mean duration of 2.5 days, and 2.3% of ponds lasted for at least a week ( Figure 6). The frequency distribution of pond duration was best characterized as a semi-log relationship, where short duration ponds lasting 4 days or fewer were common and long-lasting ponds were relatively uncommon. The distribution of durations was bimodal, with a slight increase in long-duration ponds relative to medium-duration ponds lasting between 5 and 10 days.
Relating both duration and initial size in a bivariate hexagon bin plot indicated that small, short-duration ponds were the most common (Figure 7), as might be expected from the size and duration histograms alone. Interestingly, though, the stripe-like pattern along the y-axis suggests that size class did not appear to be a good predictor of duration outside of the bins representing small, brief ponds. The pattern expected intuitively was that larger ponds would have greater duration, but these data suggest that the duration of a pond was mostly independent of its initial size.

Discussion
Although a small percentage of the cropland we studied exhibited transient ponding, these events are still likely to be important in their effects at a landscape scale, especially in the context of terrestrial biogeochemistry. New platforms, such as the PlanetScope constellation that provide high resolution coverage of the Earth's surface, both spatially and temporally, can capture the visible dynamics of such transient, rapidly-changing events, whereas previous platforms, such as Landsat and MODIS, are constrained by the tradeoff between spatial resolution and temporal resolution [19]. This work continues the efforts of similar studies that used high resolution imagery to delineate flooding and inundation using platforms such as RapidEye and WorldView [35][36][37]. Limitations on this approach still remain, but these tools can provide new insights of processes and outcomes in monitoring floods, crop health, natural disasters, ecological productivity, phenology, and other widespread, spatially heterogeneous, and temporally dynamic events where useful imaging data would otherwise be unavailable or infeasible to collect. With these tools we can now quantify ponding frequency, duration, and spatial extent at the landscape scale as the frequency of extreme precipitation events is expected to increase.
The high spatiotemporal resolution of this data set illustrates a novel capability to detect transient events like ponding. Most ponds are small and of short duration, while fewer ponds are large and last a long time (Figure 7). The heterogeneity seen at the plot scale suggests that the status of tile drains is an important factor in the hydrological dynamics of the emergence and subsidence of transient ponds. Determining the locations and functional capacity of tile drains is nontrivial and poorly constrained [15,38], which presents challenges to spatially explicit, mechanistic hydrological model-based approaches in describing the landscape-scale behavior of ponds. Alternatively, the results from this approach can be used as a data source to aid the parameterization of spatially explicit hydrological models.
Although this approach and the data which facilitate it are a major improvement to observations of dynamic, transient events on the land surface, some limitations remain. These limitations likely cause underestimation, but an underestimate is much more valuable than lacking these data entirely, since previously available platforms are impractical or entirely insufficient for this purpose. Occlusion by clouds and cloud shadows (Figure 4b), like with nearly all remote sensing platforms, reduce the data availability of land surface imaging and may requiring gap-filling approaches [39,40]. Active radar platforms like Sentinel-1 can penetrate cloud cover [40], but we were unable to use Sentinel-1's data products for this analysis because the imagery was too noisy and attempts at smoothing were insufficient to recover consistently usable data. Because of occlusions and the PlanetScope constellation being below the 130-satellite threshold required for complete global, daily coverage, we did not have full coverage of the county for every day, which also drives a systematic error of our analysis towards underestimation. We expect that the number of long-lasting ponds is the only overestimated component of our analysis. A classification from a dry period in the middle of the growing season detected permanent water bodies below the threshold of the CDL, but their contribution was negligible.
This analysis was restricted to spring when the likelihood of extreme precipitation events and ponding is greatest [41,42]. However, ponding can occur throughout the year and the reliance on periods with low vegetation cover is another limitation to this approach. Full canopy closure in maize-soybean systems, depending on crop type and planting date, typically occur between the end of May and the start of July [43] and completely occludes the soil surface and any ponding occurring under the canopy. However, even the presence of some vegetation will reduce the ability to discriminate between classifications due to pixel mixing in the critical NIR band. The NIR band is an important source of spectral information for identifying a water surface, due to water's very low reflectivity in NIR [44], whereas vegetation is highly reflective in NIR [45], which is likely to increase the rate of false negatives for mixed pixels. Relatedly, areas with broad, shallow ponds that exhibit a mosaic-like pattern of saturated soil and water surface are also susceptible to pixel mixing that likely increases the false negative rate in the classification. Resolving these areas would require even finer spatial resolution at the submeter scale, which can be achieved with airborne surveys from unmanned aerial vehicles [46,47].
While our data show that only 2.5% of the maize-soybean cropland exhibited ponding, these areas could act as "hot spots" of biogeochemical transformation during "hot moments" following large precipitation events [48]. Reducing conditions associated with low oxygen availability can develop rapidly following soil inundation [49], potentially stimulating anaerobic biogeochemical processes that produce potent greenhouse gases such as nitrous oxide and methane. Clay swelling [50] and the reduction of poorly crystalline iron minerals [51][52][53] during periods of ponding can release organic matter previously protected from decomposition in organo-mineral complexes. This increase in substrate availability can stimulate carbon dioxide production after each ponding event, possibly leading to increases in annual soil carbon dioxide emissions associated with increased ponding frequency despite suppressed soil carbon dioxide emissions during periods of ponding. In addition, legacy effects of drainage history can lead to different mechanistic controls on soil greenhouse gas production in areas prone to ponding versus not [54], thereby affecting ecosystem-scale greenhouse gas fluxes.

Conclusions
High spatiotemporal resolution RGB-NIR imagery is effective for accurately detecting and monitoring flooding events, which are widespread and long-lasting in Champaign County and likely exhibit similar dynamics as other regions of the US Corn Belt with similar soil types and topography. Currently, transient events such as soil flooding may have substantial implications for landscape-scale biogeochemical cycling, soil and water conservation, agriculture, and sustainability. In the near future, the frequency and spatial extent of these events will likely increase as patterns of rainfall continue to intensify [16,38,39,52,53]. Ponding may result in delayed planting, loss of input and yields, increased nutrient runoff, intensification of soil organic matter mineralization, and altered greenhouse gas fluxes. Future work directly based on these results will focus on quantifying the influence these events have on landscape-scale net greenhouse gas fluxes via a process-based biogeochemistry model.