High-Resolution Surface Water Classifications of the Xingu River, Brazil, Pre and Post Operationalization of the Belo Monte Hydropower Complex

We describe a new high spatial resolution surface water classification dataset generated for the Xingu river, Brazil, from its confluence with the Iriri river to the Pimental dam prior to construction of the Belo Monte hydropower complex, and after its operationalization. This river is well-known for its exceptionally high diversity and endemism in ichthyofauna. Pre-existing datasets generated from moderate resolution satellite imagery (e.g., 30 m) do not adequately capture the extent of the river. Accurate measurements of water extent are important for a range of applications utilizing surface water data, including greenhouse gas emission estimation, land cover change mapping, and habitat loss/change estimates, among others. We generated the new classifications from RapidEye imagery (5 m pixel size) for 2011 and PlanteScope imagery (3 m pixel size) for 2019 using a Geographic Object Based Image Analysis (GEOBIA) approach. Dataset: DOI number: https://doi.org/10.6084/m9.figshare.12521900.v1 Dataset License: CC-BY 4.0.

accurately document the changes (e.g., [12,13]). Continental-scale high-spatial resolution fluvial mapping initiatives such as [14] are not yet available for South America. The data readily available to Brazilian agencies and others studying these regions are usually generated from moderate resolution images with pixel sizes of 30 m or more, and can lead to substantial uncertainties in calculations of impact area, greenhouse gas emissions, habitat loss, etc. [15]. In order to quantify the extent of the flooded area due to the main and artificial reservoirs of the dam, we developed a high spatial resolution surface water classification for 2011 (pre Belo Monte) and 2019 (after operationalization of the dam). These datasets were produced from RapidEye (5 m) and PlanetScope (3 m) imagery classified with a Geographic Object Based Image Analysis (GEOBIA) approach [14,16].

Data Description
The dataset for download from figshare (https://doi.org/10.6084/m9.figshare.12521900.v1) comprises two high-spatial resolution surface water maps (dry season) for 2011 and 2019. The extent of the surface water classification spans from the confluence of the Xingu and Iriri rivers to the Pimental dam, downstream of the city Altamira ( Figure 1). For the 2019 dataset, the artificial reservoir is also included, beyond the power station where the outflow rejoins the Xingu river (Belo Monte dam). The datasets are provided as ESRI shapefiles and in Geotiff format (native 5 and 3 m pixel sizes, respectively). In the Geotiffs, pixels represent one of two classes: water (1) or land (0). All datasets are projected in UTM 22S WGS84.

Methods
The 2011 dataset is a classification of 10 RapidEye scenes acquired on 4 July ( Figure 1A, Table  1). The RapidEye constellation is comprised of five satellites, each with a multispectral pushbroom imager acquiring five bands from the blue to the near infrared wavelengths. The images are acquired Figure 2. Points of comparison between the 2011 and 2019 satellite imagery. In (A), the difference in water level is attributed to yearly and seasonal differences in flow. The 2019 imagery was acquired later in the dry season (August) than the 2011 imagery (July). This area near the confluence with the Iriri river is outside of the influence of the main reservoir. Both (B) and (C) are within the range of the impacts of the main reservoir. A higher water level and clearing of vegetation from the islands can be seen in 2019 in comparison to 2011 (B,C). These comparisons are downstream of Altamira in the sector of the river with the greatest change caused by the main reservoir of the Pimental dam. Many islands can be seen in 2011 (D-F) which are underwater in 2019. In F, the intake for the canal can be seen on the eastern side of the image from 2019.

Methods
The 2011 dataset is a classification of 10 RapidEye scenes acquired on 4 July ( Figure 1A, Table 1). The RapidEye constellation is comprised of five satellites, each with a multispectral pushbroom imager acquiring five bands from the blue to the near infrared wavelengths. The images are acquired at a nadir ground sampling distance (GSD) of 6.5 m. The orthorectified imagery used here (level 3A) is produced at a 5 m pixel size [17].
The 2019 dataset is a classification of 23 PlanetScope scenes acquired between 24 July and 24 August ( Table 2). PlanetScope is a constellation of more than 130 3U form factor CubeSats. The majority of the images used for the classification are from Dove PS satellites [18]. Their 2D frame detector has 6600 columns × 4400 rows. The detector uses a Bayer pattern filter separating the blue, green and red channels. The top half (2200 rows) are used for the RGB bands; a NIR filter restricts the wavelengths of light. In the lower 2200 rows, another filter blocks all but the NIR wavelengths. The RGB half is combined in processing with the NIR half to produce four-band multispectral scenes [18]. Our classification also utilized one image from a next-generation Dove-R PS2.SD satellite ( Table 2). The PS2.SD instrument utilizes the same detector as the PS, but rather than a Bayer filter and NIR band-pass filter, a butcher Data 2020, 5, 75 5 of 12 block filter design segments the detector into four sections of 1100 rows of pixels, each with its own filter for one of the four spectral bands. The final multispectral scene is generated by stacking a number of consecutive frames on either side of a given frame [18]. All PlanetScope images used here are 'multispectral analytic surface reflectance' products (acquired at 3.7 m GSD, orthorectified to a 3 m pixel size) [19,20]. Figure 2 illustrates points of comparison between the imagery from the two years along the Iriri confluence to Pimental dam sector. Both the RapidEye and PlanetScope images were downloaded through a subscription from https://www.planet.com/explorer, the web interface to the catalogue of imagery from satellites managed by Planet Labs. The images from Tables 1 and 2 were classified through a GEOBIA approach in eCognition Developer 9.4 (Trimble Geospatial, Sunnyvale, CA, USA). The objective of GEOBIA is to improve upon and replicate human interpretation of imagery in an automated manner [21], allowing for large areas Data 2020, 5, 75 6 of 12 to be analyzed efficiently [14], and is effective for land cover classifications of high spatial resolution imagery with a low spectral resolution [16,22]. Images were first segmented with the multiresolution segmentation algorithm using the following parameters: scale = 75, shape = 0.1 and compactness = 0.5. This algorithm is a bottom-up process that begins with one-pixel objects, and through an iterative process, merges neighboring pixels based on the relative homogeneity criteria of shape and colour [23]. Each of the four bands had different weights applied for the segmentation to maximize the separation of water from other materials: blue = 0.2, green = 0.2, red = 0.5 and NIR = 1 (Figure 3). For consistency, the red-edge band from RapidEye was not used. Training samples of segments representing "water", "forest," "non-forest" and "cloud" were manually selected. For classification, the nearest neighbor feature space was comprised of object level mean values of all four bands, brightness (average of the means of the four bands), maximum difference (maximum difference between bands) and the standard deviations of the objects in all four bands. These metrics served as the training data for a nearest neighbor classification.

11-07-19
132719 1032 Artificial reservoir 1 This image is from a next generation Dove-R PS2.SD instrument, all others are from Dove PS instruments. 2 This image was not used in the classification. It was only used as a guide in editing the vertices of the surface water polygons contaminated by cloud cover in the final classification.).
The images from Tables 1 and 2 were classified through a GEOBIA approach in eCognition Developer 9.4 (Trimble Geospatial, Sunnyvale, CA, USA). The objective of GEOBIA is to improve upon and replicate human interpretation of imagery in an automated manner [21], allowing for large areas to be analyzed efficiently [14], and is effective for land cover classifications of high spatial resolution imagery with a low spectral resolution [16,22]. Images were first segmented with the multiresolution segmentation algorithm using the following parameters: scale = 75, shape = 0.1 and compactness = 0.5. This algorithm is a bottom-up process that begins with one-pixel objects, and through an iterative process, merges neighboring pixels based on the relative homogeneity criteria of shape and colour [23]. Each of the four bands had different weights applied for the segmentation to maximize the separation of water from other materials: blue = 0.2, green = 0.2, red = 0.5 and NIR = 1 (Figure 3). For consistency, the red-edge band from RapidEye was not used. Training samples of segments representing "water", "forest," "non-forest" and "cloud" were manually selected. For classification, the nearest neighbor feature space was comprised of object level mean values of all four bands, brightness (average of the means of the four bands), maximum difference (maximum difference between bands) and the standard deviations of the objects in all four bands. These metrics served as the training data for a nearest neighbor classification. The individual classifications were simplified to "water" and "land" classes through a decision tree in ENVI 5.5 (L3 Harris Geospatial, Boulder, CO, USA) and mosaicked into a single binary raster for each period (2011 and 2019). The mosaics were converted to polygon datasets in ArcMap 10.7 (ESRI, Redlands, CA). Polygons representing the "land" class were removed and each water polygon was inspected through an overlay with the image from which it was generated. Erroneous water polygons, such as those representing dark shadows (from topography or tree crowns), were removed. For the few areas where small clouds obstructed the shoreline, the polygon vertices were edited to trace the shoreline without cloud contamination (Figure 4). Baseline images from either WorldView The individual classifications were simplified to "water" and "land" classes through a decision tree in ENVI 5.5 (L3 Harris Geospatial, Boulder, CO, USA) and mosaicked into a single binary raster for each period (2011 and 2019). The mosaics were converted to polygon datasets in ArcMap 10.7 (ESRI, Redlands, CA). Polygons representing the "land" class were removed and each water polygon was inspected through an overlay with the image from which it was generated. Erroneous water polygons, such as those representing dark shadows (from topography or tree crowns), were removed. For the few areas where small clouds obstructed the shoreline, the polygon vertices were edited to trace the shoreline without cloud contamination (Figure 4). Baseline images from either WorldView 1 (acquired in 2011) or other PlanetScope scenes acquired close to the date of the classification ( Table 2) were used to guide the vertex editing.
The final edited polygon layer was dissolved to create a single polygon layer representing surface water. For validation, mosaics of the imagery (Tables 1 and 2) were created and points representing water and land were generated through visual interpretation of the image mosaics. Tables 3 and 4 illustrate the confusion matrices for the two classifications. For 2011, 658 and 885 points were generated for water and land (consisting of rock, sand, or vegetation), respectively. For 2019, 750 and 813 points were generated for water and land, respectively. The points were generated throughout the entire study area. Boundary (edge) pixels between classes were avoided due to the potential mixing of materials in these pixels.  (Table  2) were used to guide the vertex editing. The final edited polygon layer was dissolved to create a single polygon layer representing surface water. For validation, mosaics of the imagery (Tables 1 and 2) were created and points representing water and land were generated through visual interpretation of the image mosaics. Tables 3 and 4 illustrate the confusion matrices for the two classifications. For 2011, 658 and 885 points were generated for water and land (consisting of rock, sand, or vegetation), respectively. For 2019, 750 and 813 points were generated for water and land, respectively. The points were generated throughout the entire study area. Boundary (edge) pixels between classes were avoided due to the potential mixing of materials in these pixels.   The total surface water area was calculated as 426.89 km 2 for 2011 and 569.63 km 2 for 2019. We estimate the surface area within the impact zone of the main reservoir to be 220.9 km 2 in 2011 and 426.4 km 2 , a difference of 205.5 km 2 . Figure 5 illustrates the area of greatest change in surface water extent from the Xingu's confluence with the Iriri river to the Pimental dam (including the artificial reservoir) following the operationalization of the Belo Monte dam. The total surface water area was calculated as 426.89 km 2 for 2011 and 569.63 km 2 for 2019. We estimate the surface area within the impact zone of the main reservoir to be 220.9 km 2 in 2011 and 426.4 km 2 , a difference of 205.5 km 2 . Figure 5 illustrates the area of greatest change in surface water extent from the Xingu's confluence with the Iriri river to the Pimental dam (including the artificial reservoir) following the operationalization of the Belo Monte dam.

User Notes
The fine spatial resolution of the imagery from both RapidEye and PlanetScope allowed for an accurate and more detailed classification of surface water than is possible from moderate resolution optical satellite imagery such as Landsat (30 m) or Sentinel-2 (10-60 m) (e.g., [12]). There are many different types of rocks, soils and landforms that comprise the shoreline and small islands (Figures 6  and 7). In areas of higher flow, riverweed (Podostomaceaea) can be found adhering to rocks in a range of states from dry plants to green leaves and flowers ( Figure 6). These all add to the complexity of the classification. Despite the benefits the 3-5 m pixel sizes provide, some challenges remain where some features are smaller than a pixel. Channels that are narrower than the pixel size of the imagery are likely to have been missed in the classification. Small boulders (e.g., Figure 7) in the river may have been misclassified as water if they occupied less than a pixel in area. Conversely, large patches

User Notes
The fine spatial resolution of the imagery from both RapidEye and PlanetScope allowed for an accurate and more detailed classification of surface water than is possible from moderate resolution optical satellite imagery such as Landsat (30 m) or Sentinel-2 (10-60 m) (e.g., [12]). There are many different types of rocks, soils and landforms that comprise the shoreline and small islands (Figures 6  and 7). In areas of higher flow, riverweed (Podostomaceaea) can be found adhering to rocks in a range of states from dry plants to green leaves and flowers ( Figure 6). These all add to the complexity of the classification. Despite the benefits the 3-5 m pixel sizes provide, some challenges remain where some features are smaller than a pixel. Channels that are narrower than the pixel size of the imagery are likely to have been missed in the classification. Small boulders (e.g., Figure 7) in the river may have been misclassified as water if they occupied less than a pixel in area. Conversely, large patches of dead trees in forest flooded by the reservoir (e.g., Figure 7) that occupy areas larger than a pixel may have led to false negatives in the classification. of dead trees in forest flooded by the reservoir (e.g., Figure 7) that occupy areas larger than a pixel may have led to false negatives in the classification. Figure 6. Field photos showing a range of rock types and sand that are present in the study area outside of the impact zone of the dam located within the pre-Cambrian Complex of the Xingu (see [24] for a description of the geology of the region). Dry and live riverweed (Podostomaceaea) can be seen covering many of the surfaces ranging in colour from green and brown (leaves) to pink (flowers) and white (dry plants). Many of the rock formations are smaller than medium spatial resolution satellite imagery (e.g., 30 m Landsat or 10 m Sentinel-2). Figure 6. Field photos showing a range of rock types and sand that are present in the study area outside of the impact zone of the dam located within the pre-Cambrian Complex of the Xingu (see [24] for a description of the geology of the region). Dry and live riverweed (Podostomaceaea) can be seen covering many of the surfaces ranging in colour from green and brown (leaves) to pink (flowers) and white (dry plants). Many of the rock formations are smaller than medium spatial resolution satellite imagery (e.g., 30 m Landsat or 10 m Sentinel-2).
Furthermore, it is important to take into consideration the highly seasonal water flow and the natural flood pulse of the river. The river has four hydrological periods, low water (September-November), flooding (December-February), high water (March-May) and receding water (June-August) [25], with discharge rates ranging from, on average,~2000 m 3 /s in October tõ 21,000 m 3 /s in April [8]. It has one of the highest annual variations in flow of all Amazon tributaries. The imagery from 2011 (4 July) is earlier in the receding period than that of 2019 (11-24 August) due to the availability of imagery with minimal cloud cover for the entire area. There is an approximate 1000 m 3 /s difference in discharge between July and August [8]. The effect of this difference can be seen in the southern sector of the data near the confluence of the Iriri river, outside the impact zone of the reservoir. In Figure 2A, for example, the higher water level is seen in the RapidEye imagery from 2011 in comparison to the PlanetScope image from 2019, where there is a larger amount of rock exposed in the channels. With continuous acquisition of daily revisit satellite imagery, over time it may be possible to acquire minimal cloud cover imagery for the high water periods as well, providing a more thorough assessment of the seasonal extents of the river.
Data 2020, 5, x FOR PEER REVIEW 11 of 13 Figure 7. Field photos from the zone impacted by the reservoir at the boundary of the pre-Cambrian Complex of the Xingu and the Amazon Sedimentary Basin (see [24] for a description of the geology of the region). The rock formations differ in this sector compared to Figure 6. Cleared islands with flooding and flooded forest (vegetation not cleared prior to flooding) from within the reservoir near the Pimental dam can also be seen.
Furthermore, it is important to take into consideration the highly seasonal water flow and the natural flood pulse of the river. The river has four hydrological periods, low water (September-November), flooding (December-February), high water (March-May) and receding water (June-August) [25], with discharge rates ranging from, on average, ~2000 m 3 /s in October to ~21,000 m 3 /s in April [8]. It has one of the highest annual variations in flow of all Amazon tributaries. The imagery from 2011 (4 July) is earlier in the receding period than that of 2019 (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24) August) due to the availability of imagery with minimal cloud cover for the entire area. There is an approximate 1000 m 3 /s difference in discharge between July and August [8]. The effect of this difference can be seen in the southern sector of the data near the confluence of the Iriri river, outside the impact zone of the reservoir. In Figure 2A, for example, the higher water level is seen in the RapidEye imagery from 2011 in comparison to the PlanetScope image from 2019, where there is a larger amount of rock exposed in the channels. With continuous acquisition of daily revisit satellite imagery, over time it may be possible to acquire minimal cloud cover imagery for the high water periods as well, providing a more thorough assessment of the seasonal extents of the river.