Automatic Detection of Open and Vegetated Water Bodies Using Sentinel 1 to Map African Malaria Vector Mosquito Breeding Habitats

: Providing timely and accurate maps of surface water is valuable for mapping malaria risk and targeting disease control interventions. Radar satellite remote sensing has the potential to provide this information but current approaches are not suitable for mapping African malarial mosquito aquatic habitats that tend to be highly dynamic, often with emergent vegetation. We present a novel approach for mapping both open and vegetated water bodies using serial Sentinel-1 imagery for Western Zambia. This region is dominated by the seasonally inundated Upper Zambezi ﬂoodplain that suffers from a number of public health challenges. The approach uses open source segmentation and machine learning (extra trees classiﬁer), applied to training data that are automatically derived using freely available ancillary data. Reﬁnement is implemented through a consensus approach and Otsu thresholding to eliminate false positives due to dry ﬂat sandy areas. The results indicate a high degree of accuracy (mean overall accuracy 92% st dev 3.6) providing a tractable solution for operationally mapping water bodies in similar large river ﬂoodplain unforested environments. For the period studied, 70% of the total water extent mapped was attributed to vegetated water, highlighting the importance of mapping both open and vegetated water bodies for surface water mapping.


Introduction
Providing timely and accurate maps of surface water is valuable for a range of applications. Mapping flood water using satellite earth observation technologies is a routine practice, assisting emergency services as well as informing flood mitigation strategies, e.g., [1][2][3][4][5]. Additionally, satellite imagery has been successfully used to delineate and characterise wetland areas, providing spatial information for monitoring and managing wetland habitats, e.g., [6][7][8][9][10]. There is also considerable potential for using satellite-derived water body products for operational use in public health.
When water bodies are vegetated, they can often be characterised by relatively high backscatter due to either volumetric scattering from the canopy or the effect of double-bounce backscattering [20,38,39], when accounting for other factors such as the degree of canopy penetration, related to polarisation and wavelength and density of the vegetation canopy [20,38,[40][41][42][43]. Schlaffer et al. [7] generated thematic maps of wetlands in Zambia using multi-temporal radar imagery, exploiting seasonally variable backscatter characteristics of different land cover types including open water, vegetated water and dry land. Despite the success of this approach -similar to many flood mapping systems discussed previously -there is a reliance on dry season imagery to provide a reference for change. As such, this approach is not suitable for mapping aquatic mosquito habitats in the dry season.
A successful attempt to extract both open and vegetated water bodies has been demonstrated by Plank et al. [44]. Their approach combines C-band Sentinel-1 and L-band PALSAR-2 imagery. While Sentinel-1 is used to map open water, PALSAR-2 is used to map vegetated water bodies, exploiting the greater canopy penetration offered by the longer wavelength L-band and multi-polarity imagery. However, the reliance on the PALSAR-2 data limits the application of this approach to operational mosquito vector-borne disease control because the imagery is not currently freely available. Until such time as widely available L-band imagery is available, it would be beneficial to explore the use of Sentinel-1 imagery to map both open and vegetated water bodies, despite its deficiencies in terms of canopy penetration. Tsyganskaya et al. [45] demonstrated a successful approach for detecting both open and vegetated water bodies in the Caprivi Strip, Namibia using Sentinel-1 imagery. This was achieved by generating metrics characterising temporal change in backscatter, enabling the delineation of temporary vegetated water bodies. Despite its success (overall accuracy: 85%), this approach is tailored towards the detection of water during flood events-its reliance on reference dry season imagery means that this approach too is not readily applicable to the detection of water bodies during the dry season that is of key importance for application to public health.
For a given radar wavelength, the high double-bounce backscatter signal from the vegetation-water surface will be lost where the canopy density reaches a critical level [40,46]. As such, vegetated water bodies can expect to be detected in a radar image up to a particular level of canopy closure. In the case of relatively low penetrating C-band Sentinel-1 imagery, it is expected that detectable vegetated water bodies will consist of relatively low canopy density [20,40,47]. This represents a significant drawback for water body mapping in many floodplain environments, e.g., dense canopy mangrove forests in tropical coastal floodplains that cannot be penetrated by C-band radar [48]. Although water bodies with dense vegetation canopies are likely to be omitted by general water body mapping programmes, in terms of mosquito control, water bodies with relatively sparse vegetation are characteristic of important malaria vector (Anopheles gambiae sensu lato) aquatic habitats in sub-Saharan Africa [35,37,49,50] (Figure 1). Furthermore, other vector species (e.g., Anopheles coustani, An. squamosus, An. ziemanni) are found even in more densely vegetated water bodies [51][52][53][54] reinforcing the need to include these in the classification.
Classification approaches using machine learning have proven to be successful in mapping open and vegetated water bodies using radar imagery [8,25,44,55]. Machine learning tends to be a supervised approach, requiring sufficient training data that are traditionally labour intensive to collect. However, some have developed methods for automatic extraction of water body training data through refinement of existing datasets such as the MODIS-derived static water mask and Pekel et al.'s [56] Landsat-derived water occurrence layer [25,55,57]. This approach has strong potential for use in developing an operational monitoring technique but so far, the use of pre-existing datasets for training machine learning classifiers has not been applied to vegetated water bodies. Although water bodies with dense vegetation canopies are likely to be omitted by general water body mapping programmes, in terms of mosquito control, water bodies with relatively sparse vegetation are characteristic of important malaria vector (Anopheles gambiae sensu lato) aquatic habitats in sub-Saharan Africa [35,37,49,50] (Figure 1). Furthermore, other vector species (e.g., Anopheles coustani, An. squamosus, An. ziemanni) are found even in more densely vegetated water bodies [51][52][53][54] reinforcing the need to include these in the classification.

Figure 1.
Example malarial mosquito aquatic habitats including (A) saturated areas driven by topographic convergence of subsurface moisture; (B) depressions within floodplains of active river channels with well-developed levees; (C) water bodies within relict palaeochannel systems; (D) pools located in perennial or seasonally active river channels; (E) spring-fed pools; and (F) river flood inundation zone. Images A-E taken within a malaria endemic area in Tanzania, from Hardy et al. [35] and F in Zambia. Each site contained malaria vector larvae.
One of the key challenges in mapping open water using radar imagery is the backscatter similarities with flat unvegetated or sparsely vegetated areas [29,58]. Such errors have been handled in flood mapping routines through the use of a frequently low backscattering mask, or Sand Exclusion Layer [29]. However, this approach relies on relatively coarse-scale products (e.g., Shuttle Radar Topography Mission (SRTM) Water Body Data (SWBD), the Water Indication Mask (WAM) [58], or the global water occurrence layer [56]) to delineate permanent water that would also be contained with the mask. These datasets are too coarse to account narrow canals or waterways in floodplain areas that often function as main transport routes for health practitioners. Equally, small water bodies can often act as important aquatic habitats for vector mosquitoes [49]. Additionally, some water mask products, e.g., [56] include large amounts of missing data due to persistent cloud cover. One approach that has not explored for reducing confusion between open water and smooth unvegetated surfaces is the use of a consensus approach to classification (e.g., see Dargie et al. [59]), applying a measure of certainty to refine the classification.
This study aims to develop a novel approach for mapping both open and vegetated water bodies using single epoch Sentinel-1 images. Automatically derived training data will be used to drive an object-oriented machine learning classification, which is run multiple times for the same image with the resulting classification certainty being used to refine the final classified map. The approach is developed for a malaria endemic region in sub-Saharan Africa with very high prevalence of the disease and with a number of other public health challenges relating to surface water extent-not least the challenge of providing health care to rural communities where access to health facilities is frequently compromised by significant flooding.

Study Site
The water body mapping approach was developed using the Barotse Floodplain of the Zambezi River in Western Province, Zambia as a test case ( Figure 2). The floodplain is located on a deposit of Kalahari sand [60] covering an area of~5500 km 2 , though the inundated area is estimated to extend to a maximum of 10,750 km 2 when the floodplains of tributaries such as the Luena River are included [61]. An outcrop of resistant basalt south of the floodplain constricts the Zambezi channel, promoting the upstream development of these extensive wetlands. The Upper Zambezi catchment extends into Angola and North-west Zambia; river discharge is distinctly seasonal, reflecting the increased rainfall to these areas between October and April from the shifting Inter-Tropical Convergence Zone (ITCZ). Following the rainy season peak in January-February, seasonal flood peaks arrive in April and recede through May to July, with flows immediately downstream at Victoria Falls ranging between 400 m 3 s −1 and 2300 m 3 s −1 [62]. Seasonal inundation to the edge of the escarpment occurs annually; however, a pronounced decadal periodicity in the magnitude of inundation is observed, superimposed on a gradually increasing trend [63]. The floodplain itself extends up to 50 km in width following the course of the Zambezi for 230 km ( Figure 2). As with all such large floodplains, complex hydrological regimes are observed as flows interact with topographically complex surfaces producing an extensive collection of geomorphological features and processes [64]. The Zambezi itself exhibits an anabranching planform, as seen with many such large rivers [65], with both active and abandoned accessory channels visible throughout the floodplain. The Kalahari sands are interspersed with silt and mud deposited by flood events [66] through a complex depositional web [67].
Yet water storage on the floodplain is spatially and temporally variable and is not supplied exclusively from overbank flooding [68]. Tributaries from the surrounding landscape enter the floodplain and run along the fringes of the escarpment on fine muddy deposits converted into rice paddies. Several of these tributaries are supplied by small shallow waterlogged depressions that form on peat deposits elevated above the Barotse floodplain, known as 'dambos' [69]. Connectivity of the floodplain to these surface flows and also subsurface flows through the floodplain plays an important role in surface water dynamics [70]. As such, a complex mosaic of dry season water bodies acting as potential mosquito refugia is observed. Many such water bodies, including the extensive Luena flats, where the anabranching tributary meets the left bank of the main channel floodplain, are covered with riparian vegetation.
The floodplain has high economic value: it is productive for rearing cattle and crop cultivation and supports a human population of at least 225,000 [66]. Malaria is endemic throughout Zambia, with transmission in the Western Province occurring year-round with a distinct seasonal maximum after the flood peak [71]. Vegetation within the floodplain is typically comprised of seasonally flooded grasses locally termed 'mulapo' or seasonally cultivated grasses known as 'sitapa' [72]. During the wet season, matted floating grasses and other wetland vegetation types are abundant in inundated areas including papyrus, rushes, reeds, water hyacinth, water lettuce, kariba weed and the The floodplain itself extends up to 50 km in width following the course of the Zambezi for 230 km ( Figure 2). As with all such large floodplains, complex hydrological regimes are observed as flows interact with topographically complex surfaces producing an extensive collection of geomorphological features and processes [64]. The Zambezi itself exhibits an anabranching planform, as seen with many such large rivers [65], with both active and abandoned accessory channels visible throughout the floodplain. The Kalahari sands are interspersed with silt and mud deposited by flood events [66] through a complex depositional web [67].
Yet water storage on the floodplain is spatially and temporally variable and is not supplied exclusively from overbank flooding [68]. Tributaries from the surrounding landscape enter the floodplain and run along the fringes of the escarpment on fine muddy deposits converted into rice paddies. Several of these tributaries are supplied by small shallow waterlogged depressions that form on peat deposits elevated above the Barotse floodplain, known as 'dambos' [69]. Connectivity of the floodplain to these surface flows and also subsurface flows through the floodplain plays an important role in surface water dynamics [70]. As such, a complex mosaic of dry season water bodies acting as potential mosquito refugia is observed. Many such water bodies, including the extensive Luena flats, where the anabranching tributary meets the left bank of the main channel floodplain, are covered with riparian vegetation.
The floodplain has high economic value: it is productive for rearing cattle and crop cultivation and supports a human population of at least 225,000 [66]. Malaria is endemic throughout Zambia, Remote Sens. 2019, 11, 593 6 of 25 with transmission in the Western Province occurring year-round with a distinct seasonal maximum after the flood peak [71]. Vegetation within the floodplain is typically comprised of seasonally flooded grasses locally termed 'mulapo' or seasonally cultivated grasses known as 'sitapa' [72]. During the wet season, matted floating grasses and other wetland vegetation types are abundant in inundated areas including papyrus, rushes, reeds, water hyacinth, water lettuce, kariba weed and the water fern [73] (examples shown in Figure 3). Forested areas are not extensive, with wooded areas (typically isolated stands of mango trees) being mainly limited to floodplain islands known as 'mazulu' [72].

Datasets and Classification Procedure
Dual polarised (VV and VH) Sentinel-1 Ground Range Detected (GRD) scenes were acquired from Google Earth Engine (https://earthengine.google.com/). All scenes were pre-processed using the following steps: 1. Thermal noise removal, 2. Radiometric calibration and 3. Terrain correction. Within the Google Earth Engine Javascript API the VV/VH ratio was calculated and added to each image before exporting. A total of 59 scenes was downloaded covering the period 2016-2018. A summary of the datasets used for the classification and validation is given in Table 1. An overview of the classification approach is given in Figure 4.

Datasets and Classification Procedure
Dual polarised (VV and VH) Sentinel-1 Ground Range Detected (GRD) scenes were acquired from Google Earth Engine (https://earthengine.google.com/). All scenes were pre-processed using the following steps: 1. Thermal noise removal, 2. Radiometric calibration and 3. Terrain correction. Within the Google Earth Engine Javascript API the VV/VH ratio was calculated and added to each image before exporting. A total of 59 scenes was downloaded covering the period 2016-2018. A summary of the datasets used for the classification and validation is given in Table 1. An overview of the classification approach is given in Figure 4.
All subsequent processing was scripted in Python using the open source libraries RSGISLib [75], GDAL and scikit-learn [76]. For each VV, VH, VV/VH stacked image, a 3 × 3 adaptive Lee filter was applied that reduces speckle noise using coefficient of variation applied to a moving window while retaining textural information by classifying pixels as homogenous (pixel is replaced by an average calculated within the window), heterogenous (a weighted window average is used) or as a point target (pixel value retained) [77,78]. The filtered images were then segmented using the Shepherd segmentation [79] and various statistics were added to the raster attribute table for each band, VV, VH and the VV/VH ratio (minimum, maximum, mean and standard deviation). Visual analysis showed that the segmentation was able to simplify the radar images into a manageable number of image objects, while retaining much of the backscatter information regarding different land cover types, including water bodies, although some very narrow waterways (typically < 2 m) were lost. All subsequent processing was scripted in Python using the open source libraries RSGISLib [75], GDAL and scikit-learn [76]. For each VV, VH, VV/VH stacked image, a 3 × 3 adaptive Lee filter was applied that reduces speckle noise using coefficient of variation applied to a moving window while retaining textural information by classifying pixels as homogenous (pixel is replaced by an average calculated within the window), heterogenous (a weighted window average is used) or as a point target (pixel value retained) [77,78]. The filtered images were then segmented using the Shepherd segmentation [79] and various statistics were added to the raster attribute table for each band, VV, VH and the VV/VH ratio (minimum, maximum, mean and standard deviation). Visual analysis showed that the segmentation was able to simplify the radar images into a manageable number of image objects, while retaining much of the backscatter information regarding different land cover types, including water bodies, although some very narrow waterways (typically < 2 m) were lost. Training data were drawn from the segmented image and stored in the raster attribute table using the image statistics as well as information from ancillary datasets including the global water occurrence layer [56] and SRTM terrain derivatives. This process was carefully guided by field observations made during fieldwork campaigns in September 2018, May 2018 and March 2017, including geo-tagged photographs and broad landscape characterization (presence of water, nature of vegetation etc.) and in some instances, point measures of water depth and vegetation height. Image objects were assigned as training data for areas of open water where: a) the VV backscatter was less than −18 dB and b) the global water occurrence layer [56] identified the object as being permanently wet. Training data for vegetated water bodies were selected where the VV/VH ratio was less than 0.5 and global water occurrence layer identified the object as being wet for at least two months of the year. The optically-derived global water occurrence layer is designed to map open water only, but we found that some permanently wetted vegetated areas were also identified within this dataset, where water occurrence was recorded for at least two months of the year. As such we were able to use this dataset, alongside the VV/VH ratio to delineate sufficient vegetated water training objects.
Training data for non-wetted objects were extracted where the slope angle was greater than 1 degree. Slope angle was generated using the SRTM 3 arc second product with a 5 × 5 smoothing filter applied to supress fine scale variations. This mask was refined by selecting areas where the terrain derivative Height Above Nearest Drainage (HAND) [80] was greater than 30 m to ensure that areas at the fringes of floodplains (often marked by a defined change in elevation or escarpment) were not included. HAND has been used by other authors [21,44,81] to help eliminate false positive water body detections located above the drainage line.
The open water, vegetated water and dry training data were used to train an Extra Trees classifier (otherwise known as 'Extremely Random Forests'). This algorithm splits decision trees at random over the dataset range [76] and has been shown to outperform Random Forests as well as being computational more efficient [82]. Machine learning classifications can be affected by imbalanced training data, with under-predictions reported for classes with relatively few samples, typically a less abundant class [83]. To overcome this, before the classifier was initiated the training samples were balanced by selecting objects at random, matching the class with the least number of training samples. Similarly, parameter selection for input into the machine learning algorithm can affect the resulting classification [83]. In this study, a cross-validation grid search is used to determine optimal parameters for use in the Extra Trees classifier, testing the relative performance of variables such as number of trees used and the maximum number of features to consider when looking for the best split.
For each image stack, the balanced Extra Trees classifier was run 50 times. Tests demonstrated that running the classifier 50 times provided a balance between computational demand and a sufficient number of instances to determine classification certainty. The classification result for run was stored in the raster attribute table and the mode was calculated providing the most commonly assigned class per object. The percentage of occasions that the mode was predicted was also calculated, generating a metric of class certainty. For an object to be labelled as open water or vegetated water, there needed to Remote Sens. 2019, 11, 593 9 of 25 be 100% agreement with the mode. Essentially, this means that for objects where there was uncertainty over whether it was a water body (open or vegetated) or not, they would be relabelled as dry. In doing so, we found that the confusion between open water and other low backscattering objects was largely removed. Similarly, we found that the classification certainty refinement helped to remove much of the confusion between vegetated water bodies and other high backscattering objects such as rough water and urban settlements.
Outside the flood inundation period (July-January) a post-classification correction is applied for falsely identified open water areas (that were not sufficiently removed by the classification certainty refinement), a symptom of smooth low-backscatter fluvial deposits. For each image within this period, a mask is applied based on the predicted area of open water (that includes false positives). It was found that backscatter values between open water and low-backscattering areas were relatively distinct in VV polarisation (as illustrated in Figure 5). This distinction was exploited through Otsu thresholding, relabelling areas above the threshold as dry. Despite providing an improved classification, this refinement did result in some misclassification of open water as dry, particularly at the fringes of permanent water bodies such as the trunk Zambezi river. To resolve these issues, a permanent water mask was derived where open water was predicted in 100% of the scenes from the original, pre-refinement, classification results. Although the use of classification certainty was shown to eliminate much of the confusion between vegetated water bodies and urban settlements, following the example of Plank et al. [44] the open source dataset Global Urban Footprint (GUF), derived from TanDEM-x imagery [74], was used to mask out urban areas.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 25 thresholding, relabelling areas above the threshold as dry. Despite providing an improved classification, this refinement did result in some misclassification of open water as dry, particularly at the fringes of permanent water bodies such as the trunk Zambezi river. To resolve these issues, a permanent water mask was derived where open water was predicted in 100% of the scenes from the original, pre-refinement, classification results. Although the use of classification certainty was shown to eliminate much of the confusion between vegetated water bodies and urban settlements, following the example of Plank et al. [44] the open source dataset Global Urban Footprint (GUF), derived from TanDEM-x imagery [74], was used to mask out urban areas. Resulting thematic classifications were validated using stratified random points, between 800-1,000 per scene (total = 9,880, open water = 3,215; vegetated water = 3,593; other = 3,072), using scenes (n = 11) across the temporal range covering different hydrological conditions. Validations were made using 2 m Pleiades optical imagery acquired during the wet season (March 2017) and the beginning of the dry season (July 2017). Additional validations were made using 10 m Sentinel-2 imagery (February, April, May, June, August, September, December). Despite being used for mapping wetlands in the past [84][85][86], optical imagery is not an ideal tool for providing observations of vegetated water bodies particularly where vegetation canopies are completely closed, obscuring the water surface below. In this study, interpretation of optical validation imagery was supported by field observations made during fieldwork campaigns (September 2018, May 2018 and March 2017) that included water body characterization and point measures of water depth and vegetation height (although not sufficient in number for use as an independent ground-truth dataset). In the case of fully vegetated water bodies, these features tend to be spectrally distinct with very high reflectance in the near infrared due to high leaf cell structure as well as planophile leaf orientation associated with matted grasses and floating vegetation. Combined with the context in which the feature occurs Resulting thematic classifications were validated using stratified random points, between 800-1000 per scene (total = 9880, open water = 3215; vegetated water = 3593; other = 3072), using scenes (n = 11) across the temporal range covering different hydrological conditions. Validations were made using 2 m Pleiades optical imagery acquired during the wet season (March 2017) and the beginning of the dry season (July 2017). Additional validations were made using 10 m Sentinel-2 imagery (February, April, May, June, August, September, December). Despite being used for mapping wetlands in the past [84][85][86], optical imagery is not an ideal tool for providing observations of vegetated water bodies particularly where vegetation canopies are completely closed, obscuring the water surface below. In this study, interpretation of optical validation imagery was supported by field observations made during fieldwork campaigns (September 2018, May 2018 and March 2017) that included water body characterization and point measures of water depth and vegetation height (although not sufficient in number for use as an independent ground-truth dataset). In the case of fully vegetated water bodies, these features tend to be spectrally distinct with very high reflectance in the near infrared due to high leaf cell structure as well as planophile leaf orientation associated with matted grasses and floating vegetation. Combined with the context in which the feature occurs-for instance, positioned within a known marsh area-this enables the use of this imagery for identifying vegetated water bodies even where vegetation canopies are closed.
Standard accuracy assessment metrics were calculated, including user's and producer's % agreement scores for each class and corresponding overall accuracy (%) and Kappa scores. It is becoming increasingly accepted that Kappa has significant limitations rendering them potentially misleading [87]. Therefore, we have also calculated quantity and allocation disagreement, a proposed replacement for Kappa that describes how well the mapped products represent the area covered by each class (quantity disagreement) as well as the spatial agreement between mapped areas versus the validation data (allocation disagreement). Full description and discussion of these metrics can be found in Pontius and Millones [87] as well descriptions and applications in other reported studies [88][89][90][91][92][93].

Results
The approach for mapping both open and vegetated water bodies was applied to 59 Sentinel-1 radar images for Barotseland over the period 2016-18. The accuracy assessment revealed strong agreement with optical validation data, with a mean overall accuracy of 92% (kappa = 0.9) ( Table 2). The accuracy ranged from 84% in November 2016 to 97% in April 2018. This study represents the first attempt to map vegetated water bodies using Sentinel-1 alone. Both the user's and producer's results for vegetated water bodies indicated strong agreement with the validation data with mean scores of 94% (user's) and 87% (producer's) ( Table 2). Nine of the eleven assessed epoch demonstrated a broad agreement between Kappa/overall accuracy (%) and the overall agreement score (C: Table 3) that accounts for class quantity and allocation agreement, but for two epochs (6/11/2016 and 13/10/2016), Kappa and overall accuracy (%) underestimated the apparent accuracy of the classification. This was a direct results of Kappa/overall accuracy (%) not accounting for the relative area covered by each class. This is particularly important for the present study whereby dry season images are likely to be dominated (> 90% of total area) by dry land, and therefore the errors associated with this class should carry greater influence on the overall assessment of accuracy. The quantity and allocation disagreement metrics demonstrate a good overall agreement with the validation data (mean C = 0.86, s.d. = 0.07). The allocation disagreement scores demonstrate a strong ability for the developed system to accurately predict the spatial location of water (both open and vegetated) and dry land. The quantity disagreement scores demonstrate that the main source of error lies with the ability to accurately predict the area of each class, particularly during wetter periods (i.e., Feb 2018, May 2018 and Mar 2017). This is due to false positives, associated with inundated vegetation being false classified as dry land, and should be a target for future development of the classification routine.  Figure 6 demonstrates the relative importance of variables following the extra trees classification for each image. The standard deviation metrics tend not to be important, consistently contributing less than 10% to the classifier. This is due to the relatively small size of the objects-larger objects would make standard deviation more meaningful, but this would be a sacrifice of the detail in the subsequent classification product, with smaller water bodies being missed entirely. Average VV backscatter was the most important with an average contribution of more than 18%. This represents the ability of VV imagery to differentiate between high backscatter vegetated water bodies and low backscatter open water within an extra trees classifier. Equally, average VV backscatter offers the ability to differentiate between smooth open water and relatively rough non-water objects with higher backscatter. Variations over time existed with all variables (between 16-39% contribution for important variables), demonstrating the need to have adaptive strategies when delineating water, where hard thresholds set independently from time would likely lead to temporally dependent errors. Average VV backscatter becomes less important during March-April. These are the wettest months and include the greatest coverage of wetted vegetation. In these months, accurate mapping of both open and vegetated water bodies require contribution from other variables (bar standard deviation metrics), a result of the increasing complexity of the landscape.
The consensus approach to refining the classification output-classification certainty based on 50 iterations of the classifier-had variable influence over the image archive. On average, 7.2% of classified objects were changed due to classification uncertainty. The influence increased during the dry season with up to 20% of classified objects being changed due to uncertainty associated with dry flat sediments that would otherwise be classified incorrectly as open water. The post-classification refinement (classification certainty and dry season Otsu thresholding of water masks) was found to improve classification accuracies up to 7% and an improvement on errors of commission associated with open water areas of up to 33% (illustrated in Figure 7). Due to the lack of extensive urbanisation in the region, the addition of the Global Urban footprint had a negligible positive influence on the results. The consensus approach to refining the classification output -classification certainty based on 50 iterations of the classifier-had variable influence over the image archive. On average, 7.2% of classified objects were changed due to classification uncertainty. The influence increased during the dry season with up to 20% of classified objects being changed due to uncertainty associated with dry flat sediments that would otherwise be classified incorrectly as open water. The post-classification refinement (classification certainty and dry season Otsu thresholding of water masks) was found to improve classification accuracies up to 7% and an improvement on errors of commission associated with open water areas of up to 33% (illustrated in Figure 7). Due to the lack of extensive urbanisation in the region, the addition of the Global Urban footprint had a negligible positive influence on the results. As outlined in the introduction, the C-band radar signal from vegetated water bodies is expected to saturate at a particular level of vegetation canopy density [40]. As a consequence, the classification scheme outlined in this paper is only applicable for vegetated water bodies with vegetation cover below a particular level. Figure 8 shows the relationship between VV and VH dB backscatter for As outlined in the introduction, the C-band radar signal from vegetated water bodies is expected to saturate at a particular level of vegetation canopy density [40]. As a consequence, the classification scheme outlined in this paper is only applicable for vegetated water bodies with vegetation cover below a particular level. Figure 8 shows the relationship between VV and VH dB backscatter for 10,000 points over vegetated water bodies for a peak flood scene (11 May 2018). In this instance, almost 70% of points lying on vegetated water bodies were not identified by the classification routine, due to the saturation of the backscatter signal over relatively dense canopies. Field observations show that the types of areas where vegetated water bodies were successfully identified are characterised by relatively low (generally < 20 cm tall) grasses with a degree of spacing between grass stands (~20 cm apart) (Figure 9). The identification of vegetated water bodies with taller grass and vegetation stands (generally > 50 cm) was not possible due to saturation of the radar backscatter signal. In many instances, the vegetation canopy completely obscured the water's surface below to the extent that water was not observable even in field photographs ( Figure 10 and Figure  11). Conversely, during drier periods false positives associated with vegetated water bodies occur where a mixture of stubble vegetation and flat underlying soil creates a high backscatter doublebounce effect. Field observations show that the types of areas where vegetated water bodies were successfully identified are characterised by relatively low (generally < 20 cm tall) grasses with a degree of spacing between grass stands (~20 cm apart) (Figure 9). The identification of vegetated water bodies with taller grass and vegetation stands (generally > 50 cm) was not possible due to saturation of the radar backscatter signal. In many instances, the vegetation canopy completely obscured the water's surface below to the extent that water was not observable even in field photographs (Figures 10 and 11). Conversely, during drier periods false positives associated with vegetated water bodies occur where a mixture of stubble vegetation and flat underlying soil creates a high backscatter double-bounce effect. taller grass and vegetation stands (generally > 50 cm) was not possible due to saturation of the radar backscatter signal. In many instances, the vegetation canopy completely obscured the water's surface below to the extent that water was not observable even in field photographs ( Figure 10 and Figure  11). Conversely, during drier periods false positives associated with vegetated water bodies occur where a mixture of stubble vegetation and flat underlying soil creates a high backscatter doublebounce effect.     Mapped water body extent per Sentinel-1 epoch demonstrated a wide seasonal range ( Figure  12) with a total extent typically less than 1,000 km 2 during the dry season (July-January) and a maximum extent between 7,600 km 2 and 8,500 km 2 at the peak of the wet season. The peak of the wet season was mid-March during 2017, shifting to early-April for 2018. The 2018 maximum water extent was 11% greater than the previous year. For the period studied, 70% of the total water extent mapped was attributed to vegetated water with the peak in the maximum extent of inundated vegetation being followed by a peak in open water extent approximately two weeks later. Mapped water body extent per Sentinel-1 epoch demonstrated a wide seasonal range ( Figure 12) with a total extent typically less than 1000 km 2 during the dry season (July-January) and a maximum extent between 7600 km 2 and 8500 km 2 at the peak of the wet season. The peak of the wet season was mid-March during 2017, shifting to early-April for 2018. The 2018 maximum water extent was 11% greater than the previous year. For the period studied, 70% of the total water extent mapped was attributed to vegetated water with the peak in the maximum extent of inundated vegetation being followed by a peak in open water extent approximately two weeks later. Mapped water bodies over the study's time period indicate that the hydrological landscape of Barotseland is highly dynamic with maximum water extent expanding across much of the area and permanent water largely restricted to the trunk Zambezi River and raised peat bogs on the escarpment known as 'dambos' (Figure 13; example dambo shown in Figure 9). In the northeast of the study area, water bodies associated with the Luena tributary sub-catchment tend to be vegetated. This is in contrast to the floodplain of the Zambezi that during the wet season typically consists a mixture of vegetated and open water bodies. Mapped water bodies over the study's time period indicate that the hydrological landscape of Barotseland is highly dynamic with maximum water extent expanding across much of the area and permanent water largely restricted to the trunk Zambezi River and raised peat bogs on the escarpment known as 'dambos' (Figure 13; example dambo shown in Figure 9). In the northeast of the study area, water bodies associated with the Luena tributary sub-catchment tend to be vegetated. This is in contrast to the floodplain of the Zambezi that during the wet season typically consists a mixture of vegetated and open water bodies. Mapped water bodies over the study's time period indicate that the hydrological landscape of Barotseland is highly dynamic with maximum water extent expanding across much of the area and permanent water largely restricted to the trunk Zambezi River and raised peat bogs on the escarpment known as 'dambos' (Figure 13; example dambo shown in Figure 9). In the northeast of the study area, water bodies associated with the Luena tributary sub-catchment tend to be vegetated. This is in contrast to the floodplain of the Zambezi that during the wet season typically consists a mixture of vegetated and open water bodies.

Water Body Detection
The results indicate that both open and water bodies vegetated with low (generally < 20 cm) grasses can be mapped with a high degree of accuracy in Barotseland, Western Zambia using Sentinel-1 radar imagery. For the period studied, the majority of total water extent mapped was

Water Body Detection
The results indicate that both open and water bodies vegetated with low (generally < 20 cm) grasses can be mapped with a high degree of accuracy in Barotseland, Western Zambia using Sentinel-1 radar imagery. For the period studied, the majority of total water extent mapped was vegetated water ( Figure 12) highlighting the importance of mapping both open and vegetated water bodies for surface water mapping, particularly in the wet season, whereas many existing approaches focus only on mapping non-vegetated water bodies [1,2,5,22,24,27,28,30].
In most water body mapping exercises, one would expect the dominant land cover type, in terms of area, to be represented by dry land. In these instances, it is especially important that the relative area covered by each class is considered when assessing the accuracy of the classification, using measures such as the quantity and allocation disagreement [87]. This study demonstrated that % overall accuracy and Kappa statistic underestimated the apparent accuracy of the classification due to errors associated with the dominant dry and ('other') class.
Major sources of error in the classification system were identified as false positives: mapping dry land over areas of inundated vegetation. In part, this was due to the saturation of the radar backscatter signal over dense canopies. This follows findings from other studies [40,46] and highlights the importance of defining the types of vegetation over which water bodies are presumed to be mappable. In this instance, using C-band Sentinel-1, vegetation types were characterised as grasses growing in tussocks or clumps, enabling a double bounce signal from water's surface and vegetation stems to be observed. In the Barotseland study area this vegetation type was estimated to account for approximately three-quarters of areas prone to flooding, with extensive floating vegetation mats and thick, tall grasses (>50 cm) tending to obscure the water's surface below during the wet season. In this regard, it is likely that more vegetated water bodies will be mappable during the early stages of the wet season, during which water is extensive and vegetation canopies are less dense.
The inclusion of information from longer wavelength (e.g., L or P-band) systems would enable greater signal over water bodies with denser vegetation canopies, e.g., [44]. Similarly, there is evidence to suggest that the signal over vegetated water bodies may be enhanced through the decomposition of fully polarimetric radar data [9,44,94]. However, a key focus of this study was to exploit the freely available and high frequency Sentinel-1 imagery. The free availability of high temporal frequency (sub-weekly) L or P-band systems would represent a significant development for water body mapping. This type of data, as well as decomposition metrics can be easily incorporated into the object-based machine learning classification structure proposed in this paper.
Being dominated by a large seasonally inundated river floodplain, the Barotseland landscape is complex and varies significantly throughout the year. In this study, the machine learning classifier included a number of independent variables largely based on the radar image statistics. The individual importance of these variables varied overtime, reflecting the dynamic nature of the landscape but also highlighting the importance of using adaptive strategies for classifying flood inundation. In addition, the consensus approach used in this classification scheme (50 iterations) had a significant effect on the final classification (affecting the final class for up to 20% of image objects). This was valuable for limiting the degree of confusion between open water and other low backscattering objects such as flat dry sand, that are a source of considerable error in radar-based flood water mapping [29]. Further errors in this regard were handled efficiently by using Otsu thresholding to refine areas classified as open water. The combination of the consensus approach and refinement of individual classes (in this instance using Otsu-derived thresholding) helped to maximise the degree of separability between similar land cover types, providing a potential solution for other land cover mapping exercises beyond water body mapping.
This study did not carry out a formal evaluation of the segmentation approach. The accuracy assessment identified that a number of narrow channels (< 30 m) were lost due to a combination of the Lee image filter and the segmentation process. The absence of these channels in the resulting water body product has consequences for use in defining waterway transportation routes and subsequent health access. In effect, these narrow channels may represent critical waterways for accessing wider navigable channels such as the trunk Zambezi River. Future work should focus on the development of open-source segmentation techniques that preserve linear features.

Hydrological Findings and Implications for Public Health
The Barotseland landscape is dominated by herbaceous wetlands, as opposed to open water, with emergent or floating vegetation. This is significant in terms of the ability of a water body to support juvenile mosquito development: the presence of vegetation within a water body can provide Anopheles mosquito larvae with a refuge from predators [50,95]. The dynamic nature of the hydrological landscape has important implications for vector mosquito ecology and subsequent disease transmission.
Specifically, water bodies in the north-eastern part of the study area (Luena floodplain), in some locations are shown to persist for much of the hydrological year, particularly near Sitoya ( Figure 15). These persistent water bodies potentially provide a refuge for mosquito species throughout the dry season, where aquatic habitat availability is limited [23]. Furthermore, the Luena River is characterised as a highly diffuse anabranching river system due to deposition and subsequent channel avulsion. Together with persistent presence of vegetation, this is indicative of shallow, slow-moving or still water bodies, characteristics that are consistently associated with productive mosquito habitats [49,[96][97][98]. Malaria case data indicate that this region sustained relatively high transmission rates in both the wet and dry seasons of 2017, particularly at Sitoya, Ushaa and Nangili that lie within, or next to this drainage system (Figure 15).
The dambo features located on the escarpment in the east of the study area remain inundated for much of the year (Figures 13-15). These water bodies are also characterised as shallow, sunlit and typically vegetated (Figures 3F and 9). As such, the dambos, coupled with the shallow, slow-moving channels that connect them, act as important sources of malaria mosquito larvae causing widespread transmission in the 2017 wet season and sustaining relatively high numbers of cases in areas like Ikabako, Miulwe, Kalundwana and Luandui. Given their importance in driving transmission throughout the year, these hydrological features, as well as permanently flooded regions in the Luena system, could be an important target for larval source management interventions.  When developing tools for use in public health it is important to consider the degree to which the approach can be adopted by the people that need it most -in this instance, public health managers in malaria endemic regions. The classification system presented in this study makes use of a freely  When developing tools for use in public health it is important to consider the degree to which the approach can be adopted by the people that need it most -in this instance, public health managers in malaria endemic regions. The classification system presented in this study makes use of a freely Towards the south-eastern flank of the Zambezi Floodplain, half a degree south of Mongu, contributions of water from the escarpment allow water to persist for longer than in areas close to the main Zambezi channel. This is illustrated by sequentially mapped inundated areas during the draw-down period in Figure 14 showing that almost three months after the peak of the 2017 wet season, water bodies persist in these areas. Again, water bodies in this area share characteristics with productive mosquito aquatic habitats (shallow, sunlit, still or slow moving) but despite their persistence, malaria transmission in 2017 is generally limited to the wet season ( Figure 15). Field observations showed that these water bodies were relatively clear and low in temperature during the dry season, which has been shown in other studies to be less conducive to supporting oviposition and larval development [49,99,100] and may therefore not represent important vector aquatic habitats in this instance.
When developing tools for use in public health it is important to consider the degree to which the approach can be adopted by the people that need it most-in this instance, public health managers in malaria endemic regions. The classification system presented in this study makes use of a freely available, fully pre-processed radar imagery and globally available ancillary datasets. Furthermore, all processing was conducted within an open-source software environment, increasing the likelihood of this automated classification system to be used operationally. However, to fully ensure the use of this classification system and subsequent mapped products, future work should centre on incorporating these methods within a cloud-based service, such as Google Earth Engine, to avoid dependency on physical data storage and processing resources. It is also important not to over-interpret water body maps alone for delineating malaria risk and defining subsequent disease control strategies. Rather, careful consideration needs to made of the complex relationship between habitat location and persistence, alongside adult mosquito dispersal mechanisms, human population densities and socio-economic factors.

Conclusions
This paper presents an automated, tractable solution for mapping both open and vegetated water bodies in a large floodplain environment with a high degree of accuracy using Sentinel-1 radar imagery and openly available ancillary datasets. The use of open source data, combined with open source software provides an approach with considerable operation potential for use within public health programmes where resources may not permit the acquisition of commercial imagery.
Subsequent water body products demonstrate that the Barotseland area is highly dynamic, including large extents of inundated vegetation throughout the year, but particularly during the wet season. This dynamism highlights the need for developing reliable routines for mapping both open and vegetated water bodies to inform public health as well as other sectors such water resource management, riverine transportation and flood risk management.
Despite providing freely available, timely information regarding water body locations, Sentinel-1 C-band radar is limited in its ability to provide a signal from vegetated water bodies unless grasses are < 20 cm in height with sufficient gaps between grass tussocks or clumps. Future provision of freely available L or P-band imagery will help greatly improve classification of vegetated water bodies, particularly for regions with dense forest canopies [20]. The classification structure outlined in this paper is designed so that it can easily take this kind of data, with automatic generation of training data for both open water and flooded vegetation, and robust machine learning algorithms, including automatic selection of optimal parameters.
The developed water body products for Barotseland indicate a number of areas where surface water persists beyond the wet season, and in some cases throughout the year. The identification of these areas is significant in terms of public health risk mapping as these areas are likely to maintain mosquito densities throughout the year, facilitating longer malaria transmission seasons in these parts of Barotseland, and potentially forming 'source' populations for other areas when suitable habitats reappear. As well as representing an increased risk in terms of malaria transmission, these areas represent a potential target for disease management strategies either directly against the vectors through LSM or by indicating high-transmission zones where patient-based interventions can also be focussed.
We believe that in many ecological settings, targeting interventions based on operational, dynamic mapping of the breeding habitat for mosquito vectors using radar remote sensing is both feasible and potentially an important element of campaigns for disease elimination. But for this approach to be effective and efficient, we need reliable maps of both open and vegetated water. The methods developed in this paper therefore could provide an important tool for public health in malaria-ecological settings found in the Upper Zambezi and other major flooding river systems in Africa. Our work also highlights the need for widely available, high space-time resolution L-band SAR.