Dynamic Mapping of Subarctic Surface Water by Fusion of Microwave and Optical Satellite Data Using Conditional Adversarial Networks

: Surface water monitoring with ﬁne spatiotemporal resolution in the subarctic is important for understanding the impact of climate change upon hydrological cycles in the region. This study provides dynamic water mapping with daily frequency and a moderate (500 m) resolution over a heterogeneous thermokarst landscape in eastern Siberia. A combination of random forest and conditional generative adversarial networks (pix2pix) machine learning (ML) methods were applied to data fusion between the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Advanced Microwave Scanning Radiometer 2, with the addition of ancillary hydrometeorological information. The results show that our algorithm successfully ﬁlled in observational gaps in the MODIS data caused by cloud interference, thereby improving MODIS data availability from 30.3% to almost 100%. The water fraction estimated by our algorithm was consistent with that derived from the reference MODIS data (relative mean bias: − 2.43%; relative root mean squared error: 14.7%), and effectively rendered the seasonality and heterogeneous distribution of the Lena River and the thermokarst lakes. Practical knowledge of the application of ML to surface water monitoring also resulted from the preliminary experiments involving the random forest method, including timing of the water-index thresholding and selection of the input features for ML training.


Introduction
The subarctic water cycle has been affected by climate change [1]. To understand the impact of climate change upon the region, widespread broad-scale monitoring of water dynamics and related hydrogeological phenomena has been conducted using satellite remote sensing [2,3]. Among the satellite-observable quantities-terrestrial water storage [4,5], snow cover or snow water equivalent [6], soil moisture [7], and surface water [8]-surface water is the factor that directly interacts with human activity and is a key indicator of local and global hydrological cycles [2].
Spatiotemporally heterogeneous features of surface water created by thermokarst lakes and river networks, which reflect the thawing/freezing of snow (or permafrost) and the related hydrometeorological regime at sub-seasonal to interannual scales, requires observation at a high spatiotemporal resolution to understand the regional water dynamics. Detailed surface water maps are important ancillary data for land-surface modeling [9], soil moisture retrievals [10], and the understanding of methane emissions [11]. Various past studies have provided surface water maps at global scales [12][13][14][15][16]; however, either spatial or temporal details tend to be lost in such existing maps. Maps with high spatial resolution only provide monthly-annual aggregated values, whereas those with high temporal frequency (e.g., daily) rarely have high spatial resolution.
Monitoring with both spatially and temporally fine resolution using a single satellite sensor is generally challenging, owing to technical and financial limitations [17]. Progress involving a large constellation of optical microsatellites [18] may provide spatiotemporally fine-resolution datasets, except in areas frequently obscured by dense cloud cover. However, geometric-and radiometric-calibration activities for each sensor and between sensors are still in progress [19], and their uncertainty is likely to be too large to precisely observe land-surface properties. In addition, because currently such datasets are not free (i.e., they are available for a charge), users often need to limit data purchases to small areas of interest to save on costs. To avoid such concerns, another solution involves data-fusion approaches [20,21] among well-calibrated, open, and free satellite datasets. Traditionally, spatiotemporal data fusion has focused upon the integration of data representing the same land-surface or atmospheric properties (e.g., surface reflectance [22,23], evapotranspiration [24], land-surface temperature [25], leaf-area index [26], and precipitable water vapor [27]) derived from multiple optical or thermal sensors based on physical or quasi-physical models [22,23].
However, empirical approaches using machine learning (ML) have been used for a flexible fusion of datasets with highly different features [28,29]. The fundamental process involves ML training with matched pairs between different types of data and then using the training results to predict spatially high-resolution but temporally low-resolution data from counterpart data (temporally high-resolution but spatially low-resolution). Suzuki and Matuo [2] mentioned that integration between sensors with different features, particularly microwave and optical sensors, may complement shortcomings and add value to hydrological monitoring in the subarctic. In such fusions between different spectral domains, the use of sophisticated ML techniques such as the convolutional neural networks (CNNs) has been recommended as an attractive approach (e.g., [21]). Therefore, we have aimed for data fusion between microwave and optical sensors by combining popular (i.e., random forest) and sophisticated (i.e., pix2pix) ML approaches to obtain open water maps that effectively show subarctic thermokarst lakes with daily frequency. For microwave and optical data, we selected the Advanced Microwave Scanning Radiometer 2 (AMSR2) and Moderate Resolution Imaging Spectroradiometer (MODIS) sensors. AMSR2, which yields passive microwave data, is characterized by a wide swath and high observational frequency (daily), cloud-penetration ability, and coarse spatial resolution (from several to several tens of kilometers, owing to the long microwave wavelength). In contrast, MODIS, which yields optical data, has better spatial resolution (500 m) and is characterized by a relatively low chance (less than daily) of observation of the ground surface owing to cloud interference. Such contrasting features motivated us to apply data fusion between these sets [30].
The pix2pix method involves a type of conditional generative adversarial network (GAN) that interfaces two deep CNNs (generator and discriminator [31]) and enables image-to-image translation [32]. Applications of GAN-based image classification [33], segmentation [34], and super-resolution optical imagery [35] have emerged in the remote sensing community. However, owing to the computational cost and nontrivial setup, the application of this technique for monitoring boreal surface water has been barely explored [3]. The novelty of our research is that it addresses the apparent gap between such applications of state-of-the-art ML methods from computer science and spatiotemporal data fusion for satellite remote sensing. Through several experiments (including the timing of the water-index thresholding and input-feature selections), we also aim to provide practical knowledge for the application of sophisticated ML to surface water monitoring.

Study Site and Period
We selected Yakutsk (62 • :01 -63 • :05 N; 129 • :19 -130 • :23 E) in eastern Siberia, which has underlying continuous permafrost, as the study site ( Figure 1). The area included the Lena River, which plays an important role in the freshwater supply to the Arctic Ocean [5,36], thermokarst lakes (i.e., open water) created by permafrost thawing [37,38], and a boreal forest (taiga) dominated by larch trees [39]. The surface water extent in this region can vary according to river runoff (primarily controlled by springtime snowmelt [5,6,40]) and both seasonal and interannual fluctuation of thermokarst lakes.
Satellite data were collected from 3 July 2012 (i.e., the beginning of AMSR2 provision) to 2018. This research focused on the detection of liquid surface water; therefore, we excluded the winter season (November-April), when surface water freezes and much of the ground is covered in snow.  The AMSR2 sensor aboard the Global Change Observation Mission-Water (GCOM-W1) spacecraft is a passive microwave sensor that observes microwave emissions from the Earth's surface at seven frequency bands (6.925-89.0 GHz) for both horizontal and vertical polarizations. The original spatial resolution (i.e., instantaneous field of view) ranges from several to several tens of kilometers, depending upon the wavebands [41]. We downloaded the Level-3 brightness temperature product from the Japan Aerospace Exploration Agency's G-Portal website [42]. The data are provided in equidistant cylindrical (latitude-longitude) projection with 0.1 • pixel spacing. AMSR2 utilizes microwave wavebands, therefore it is much less affected by clouds than MODIS, and thus no cloud screening was applied. Uncertainty in the brightness temperature is assured by the data provider to within ±1.5 K.
To match the daytime MODIS observation data, only ascending data (recorded at 13:30 h in local time) were extracted daily. To distinguish the surface water signal from the original brightness temperature, the frequency bands at 18.7, 36.5, and 89.0 GHz were used to calculate several water indices (see Section 2.2.4).

Moderate Resolution Imaging Spectroradiometer
The MODIS aboard the Aqua satellite is an optical sensor that observes radiance in 36 solar-reflective bands; unlike the other MODIS aboard the Terra satellite, only the Aqua MODIS could be used to match the overpass time (~13:30 h in local time) with ascending AMSR2. All surface-reflectance data (MYD09GA) during the study period for two sinusoidal tiles (h23v02 and h24v02) were downloaded from the Land Processes Distributed Active Archive Center data pool [43], mosaicked, and resampled by the nearestneighbor method on a lat-long projection using the HDF-EOS-to-GeoTIFF-conversion tool. The spatial resolution was 500 m and the temporal resolution was daily, although in practice, ground surface observation was less than daily owing to cloud interference.
Low-quality pixels due to cloud (or cloud-shadow) interception or snow cover were screened by checking the quality assurance data. Geolocation errors may also affect the accuracy of data fusion, particularly when using relatively high-resolution data. The AMSR2 and MODIS images of the study site were well registered to the extent that we visually checked them; however, for the high-resolution data (i.e., MODIS), we also used the phase-only correlation method to ensure co-registration of all images at subpixel scales. During the water extraction process (see Section 2.2.4), we also excluded high-reflectance pixels (more than 0.04 in MODIS band 2) to eliminate any thin cloud or snow that may have unintentionally passed the initial quality check. We confirmed that this treatment did not eliminate surface water pixels and considered it reasonable because waterbodies generally exhibit very low reflectance in the near-infrared spectral region [44].

Ancillary Data
To achieve ML performance, we used day-of-year (DOY) information and the fifth major global reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ERA5-Land) [45] as ancillary inputs when possible. Although ERA5-Land does not produce pure observational data, it includes observational information via a data assimilation scheme, which may be useful in improving ML performance. We selected 17 variables that seemed to explain surface water fluctuation from the ERA5-Land data ( Table 1). In addition to the hourly (at UTC 05:00;~14:00 h in local time) grid-scale variables (i.e., 15 variables from T2M to SWVL4), we prepared accumulated precipitation (during winter in the previous year) and snowmelt (during spring in the current year) data across the entire Lena River basin (TPAGG and SMLTAGG). This is because the snow that accumulates in the previous winter melts during the current spring, affecting river discharge in the region [40].

Water Indices
To enhance the surface water signals from the original satellite data, water indices were calculated for both MODIS and AMSR2.
For the MODIS data, we used the well-known modified normalized-difference water index (NDWI [46]), which is calculated as where G is green-surface reflectance (from MODIS band 4) and SWIR is short-wave infrared (MODIS band 7). The pixel-based calculation constructed daily NDWI images derived from MODIS, which can be target images for ML prediction. We also used NDWI to extract the water surface by thresholding NDWI > 0.04 as water. The threshold was determined through visual interpretation at random sampling points across the study site on Google Earth, with the help of a September 2019 field survey, for a total of 100 water pixels thus obtained ( Figure 1a). Compared to optical sensors, there is less consensus concerning a robust index for extracting surface water from passive microwave images. Several researchers have used brightness temperatures associated with different polarizations [47,48] or different frequencies [49,50], which capture the signal from changes in emissivity where surface water exists. Herein, we attempted the use of the normalized difference polarization index (NDPI [48]), fraction of water surface (FWS [49]) estimated by frequency (18.7 GHz and 36.5 GHz), and the basin water index (BWI [50]), which are defined as follows: where TB F, P , and e F, P denote the brightness temperature and emissivity at frequency F and polarization P (H: horizontal or V: vertical), respectively; e dry and e wet are the emissivities of dry and wet surfaces, respectively; t is the atmospheric transmissivity; T au and T ad are the upward and downward atmospheric contributions to microwave emission, respectively; T s is the land-surface temperature; and a, b, β 0 , and β 1 are fitting parameters. Based on the literature review and preliminary parameter tunings for this region, we determined the following: at 18. We checked the feasibility of using the microwave indices (NDPI, FWS 18.7,V , FWS 36.5,V , and BWI) to extract surface water [51] and found that they were significantly correlated with the water fractions derived from an optical sensor (r = 0.58 for NDPI, r = 0.51 for FWS 18.7,V , r = 0.58 for FWS 36.5,V , and r = 0.29 for BWI). However, a unique relationship (i.e., simple regression) over the entire study site between any one microwave index and the optical water fraction was difficult to obtain, mainly because it would entail considering the local relationship between these quantities for each pixel. This is why we used pixel-based or CNN-based ML approaches with multiple microwave indices.

Machine-Learning Algorithms
Our algorithms are based on two different ML approaches, namely: pixel-based prediction of MODIS images by the random forest approach [52] and bias correction of the predicted image by pix2pix. These methods were implemented in the following environment: Python 3.6.9, tensorflow 2.  Initially, we selected matched pairs by searching for high-quality (minimal cloud interference and few erroneous pixels) MODIS images and their coincident low-resolution (AMSR2 and/or ancillary) images, from which we obtained 272 reference pairs. Spatially coarse-resolution images were oversampled into 500 m resolution via bilinear interpolation [53] to match with the resolution of MODIS. In the preliminary investigation, we confirmed that the use of bilinear interpolation instead of simple oversampling (i.e., a nearest-neighbor technique) mitigates the jaggy spatial pattern [30] in the predicted image when using pixel-based data fusion. Each variable was statistically normalized such that it ranged between 0 and 1. Among the reference pairs, we pulled those of highest quality (34 in total) about once a month for validation. The remaining reference pairs were used to train the MLs.
For basic prediction by the MODIS images in the first phase, we used pixel-based random forest regression. Through the preliminary tuning, we determined the optimal number of trees to be 100 and the minimum node size to be 3. The trained random forest could predict MODIS-like images from the low-resolution images, even when the original MODIS was not obtained owing to cloud cover or lack of observation for other reasons. Popular ML methods such as the random forest [28] and support vector machine [54] techniques have been applied for downscaling satellite-driven data and have shown robust performance; however, the predicted maps tended to show lower variation ranges than the original datasets. This feature is often observed in data-fusion approaches, rendering the effective tracking of abrupt changes or extreme phenomena difficult [55,56].
To address this issue, we attempted bias correction of the first-phase results using sophisticated ML (pix2pix) in the second phase. Subtraction of the original MODIS values from the MODIS-like values predicted by the random forest approach created "bias images", on which the pix2pix method concentrated to correct the first-phase error. This treatment was expected to adjust the MODIS-like values to more realistic ones, particularly when abrupt change is involved.
In pix2pix, a generator called U-Net [57] competes with a CNN-based discriminator in the training process. The generator creates fake target images with reference to the source images, whereas the discriminator tries to distinguish these images from the originals with the help of the source images. From the perspective of spatiotemporal data fusion, the generator can create high-resolution images from coarse-resolution images after successful training. The structures ( Figure 3) and hyperparameters ( Table 2) of the generator and the discriminator were carefully tuned to balance their performances. The generator and discriminator were alternately trained by optimizing the following objective function [32] with small batches of size 4: where G and D are the generator and the discriminator, respectively, x denotes coarseresolution images, y denotes the original fine-resolution images, z denotes random noise, and λ is a tunable parameter, which we set to 20.  Pix2pix includes a convolution layer, therefore it considers spatial patterns adjacent to a pixel of interest and enables spatially smoothed prediction. In exchange, null pixels disturb the prediction for the adjacent pixels. Therefore, we filled null pixels in the reference MODIS images with monthly climatological values for each pixel.

Experiments
For best-practice optimization of data fusion over the study site, we conducted the following experiments.

Fusion-then-Thresholding vs. Thresholding-then-Fusion
Our goal was to obtain fine-resolution water maps by NDWI thresholding of MODIS images, therefore we used two data-fusion approaches: (1) predict MODIS NDWI images by data fusion, and then apply thresholding to obtain water maps; (2) obtain water maps from the original MODIS NDWI thresholding before data fusion, apply ML by training with the water maps, and directly predict these maps through data fusion. We compared the accuracies (see Section 2.4.3) of the two approaches using the random forest method and then proceeded with the better approach by applying the following process (pix2pix bias correction).

Input-Feature Selection
Input-feature selection (more accurately known as feature-vector selection) is an essential part of ML. Adding a key feature-one that effectively explains the outcome variable and is independent from the other features-improves the ML accuracy. For example, Mizuochi et al. [28] reported that adding DOY and precipitation information may increase the accuracy of data fusion over a seasonal wetland. In contrast, increasing the number of features does not necessarily lead to successful ML, because a greater number of features requires larger training samples and computational time in general (this is known as the curse of dimensionality [58]).
To see how increasing the dimension of the feature vector improves accuracy, we attempted three cases of inputting features into the random forest: (1) only AMSR2-derived features; (2) AMSR2-derived features with seasonal information (as DOY); and (3) AMSR2derived features, DOY, and ERA5-Land features ( Table 3). The total numbers of input features were 4, 6, and 23, respectively. For DOY information, cosine and sine functions were used as features, such that the end of the previous year and the beginning of the current one could be smoothly connected: Furthermore, we selected only key features to reduce the computational cost in the second phase by checking variable importance in the random forest across the water pixels (those used to determine the NDWI threshold). The 10 best features and the image predicted by the random forest method were concatenated and used as input in the pix2pix. With the 11 input features and corresponding bias images (i.e., target image) for 238 reference pairs, pix2pix was trained. Then, pix2pix predicts bias images for all days in the study period, which were added to corresponding NDWI maps tentatively created by the random forest (i.e., bias correction).
To investigate the dependence of a feature variable upon the other variables, we also investigated variance inflation factors (VIF) of the normalized 23 variables. At 30 points randomly distributed over the entire study site, sample data were collected for each validation day (a total of 34 days) and used for VIF calculation. Unlike traditional multipleregression analysis, the random forest method has a scheme that randomly extracts feature variables when creating each decision tree and is therefore robust against inter-variance dependence (also known as multicollinearity) to some extent. However, VIF may be helpful for understanding the importance of variables in terms of their independence.

Validation
The water maps derived for validation from the original MODIS and those predicted by ML (or derived from the NDWI predicted by ML) were compared to calculate the overall accuracy (OA), producer's accuracy (PA), and user's accuracy (UA). The total water fraction for all available pixels (except null pixels due to cloud cover or other reasons in the original validation map) within the study site was also compared between the original and the predicted water maps, and the mean bias (MB) and root-mean-squared error (RMSE) in the time series were calculated.
The time series for the water fraction derived from the predicted water maps were compared with an independent daily water-fraction dataset with 25 km resolution, namely the Surface Water Microwave Product Series (SWAMPS; v.3 update 1 [59]).
Furthermore, we compared the microwave-intensity maps developed from observations by synthetic-aperture radar (Sentinel 1) to the water maps produced by our algorithm to observe the spatial characteristics of errors in our maps. We obtained backscattercoefficient (σ 0 ) images (VV polarization, IW mode) acquired in 2017 at a 10 m spatial resolution through the Google Earth engine [60]; because low-σ 0 areas are caused by specular reflection over water surfaces, they are useful for detecting surface-water.
Joint Research Centre (JRC) Monthly Water History v1.2 [16] maps were also compared; these classify water/non-water pixels at 30 m resolution for each month using an expert system with Landsat series.

Preliminary Experiments by Random Forest Method
A comparison between the NDWI prediction and the ML-water map (i.e., fusion-thenthresholding vs. thresholding-then-fusion) clearly showed that the former created better water maps for all accuracy criteria and all cases of input data (Table 4). Direct water-map prediction omitted a large proportion of the water pixels, resulting in a large MB, large RMSE, and low PA. Thus, given the current experimental conditions, water maps should not be predicted directly by ML; rather, they should be derived from the NDWI maps predicted in advance by ML.
Based on NDWI map prediction by ML, the best accuracies (PA, UA, and OA) for the water map were obtained using input-features case 3 (83.9%, 85.4%, and 99.4%, respectively). An overall tendency for the addition of more features to create greater accuracy was observed: case 2 (with six features) was superior to case 1 (with four features), and case 3 (with 23 features) was superior to cases 1 and 2. However, the relative MB and RMSE did not always show better results with more features; for example, the relative RMSE in case 3 (40.7%) was better than that in case 2 (41.7%), but the relative MB was not. This may relate to overfitting of the ML model with the limited number of training samples, suggesting a need to reduce the number of input features by selecting key features.
To explore the key features, we investigated variable importance for the 23 possible features in the random forest data ( Table 5). The most important feature was DOY (sine and cosine), followed by soil temperature at 7-28 cm depth, microwave indices (NDPI; FWS 18.7, V ), volumetric soil water at 100-289-cm depth, total evapotranspiration, etc. To avoid overfitting and reduce the computational cost, only the top 10 features (DOY sin and DOY cos ; STL2; NDPI; FWS 18.7, V ; SWVL4; E; FWS 36.5, V ; STL4; and BWI) were selected for further processing.
The importance of a variable did not necessarily correspond to its independence from the other variables (Table 6). Naturally, some variables exhibited very high VIF values because the properties of the land surface (and microwave indices) are closely interlinked. DOY, E, FWS 18.7,V , and SWVL4 were relatively low VIF among the top 10 features, and thus seemed to add values in the ML implementation. Table 4. Temporal mean accuracy comparison among cases of input features and among prediction targets for available pixels across the study site. MB, mean bias of total water fraction; RMSE, root mean squared error; OA, overall accuracy of water maps; PA, producer's accuracy; UA, user's accuracy.

Case of Input Features
Fusion Target   The time-series accuracies of the random forest predictions using the top 10 input features are shown in Figures 4 and 5. Both the relative MB (−9.55%) and RMSE (40.3%) of the total water fraction across the study site were better than those of any cases in the preliminary experiments (Table 4) There was an erroneous outlier day (17 May 2013) on which the water fraction was underestimated by~0.04 and PA was much lower (i.e., large omission of water pixels). According to the original NDWI map on the day (Figure 6), it experienced an irregularly large fraction of water surface, probably because of ice-jam flooding in the spring season. Without this outlier, the performance of the first prediction was: relative MB = −3.14% and relative RMSE = 15.1%.  The similarity of monthly climatological and predicted maps ( Figure 6) suggests that the seasonal information is the primary control of prediction by the NDWI maps. This is consistent with the fact that DOY was the most important feature in the random forest prediction (Table 5). However, the predicted maps also resembled the original maps (other than the climatological maps), suggesting the importance of secondary features (e.g., STL2; NDPI; FWS 18.7, V ) for representing fluctuation or interannual change (other than regular seasonal change).

Overall Performance of the Algorithm
Based on the results of the preliminary experiment, we trained pix2pix using the top 10 input features to predict bias contained in the first prediction result by the random forest method. The best performance of pix2pix was obtained after 21 epochs of training, from which we created bias-corrected NDWI maps. Although it seemed challenging for pix2pix to further improve the well-tuned random forest results with the limited number of training samples, the final accuracy was somewhat improved (p = 0.14 in the squared error) from MB = −9.55% and RMSE = 40.3% to MB = −8.68% and RMSE = 39.1% with the irregular inundation day (17 May 2013), and from MB = −3.14% and RMSE = 15.1% to MB = −2.43% and RMSE = 14.7% without this outlier. This suggests the potential for this sophisticated ML technique to predict more accurate water maps, including irregular events.
The final NDWI maps predicted for all days during the study period are provided as a Supplementary Video (S1). In the original maps, numerous pixels were screened as cloud cover, cloud shadow, or snow, as well as implausibly high NDWI pixels (probably caused by the cloud cover and cloud shadow that survived the screening process). Hence, the data-available pixels across the study site constituted only 30.3% on average over the entire study period. The combination of the random forest and pix2pix methods successfully filled the gaps and corrected the errors for each day, resulting in virtually 100% availability of MODIS data (except for the period during which AMSR2 was unavailable).
Based on the final predicted NDWI maps, the time series of the water fraction across the study site was estimated via fine-resolution temporal frequency and compared with those from SWAMPS (Figure 7). Seasonal change patterns were generally consistent between them; in particular, both products estimated similar water fractions during the spring and autumn. However, during the summer, our product tended to estimate a larger water fraction. Large discrepancies were observed in 2012 and 2013 in particular.  Table 1 and Landsat show several different surface water observation characteristics from our maps (Figure 8). For instance, Sentinel 1 and Landsat (i.e., JRC maps) could observe small-scale thermokarst lakes, particularly along the right bank of the Lena River because of their superior spatial resolutions. In contrast, MODIS is likely to omit sub-gridscale (< 500 m) thermokarst lakes.
Sentinel 1 also is relatively unstable in automated surface water extraction, because the backscatter coefficient is affected by various factors, such as roughness across waters' surface due to wind, the existence of vegetation, and the incidence angle. Accordingly, we tentatively determined that σ 0 < −20.5 dB indicated water, based on visual interpretation. Hence, a large proportion of the surface water along the Lena River and some thermokarst lakes were omitted by thresholding (particularly on 27 July 2017 and 20 August 2017). Based on our visual interpretation of the raw σ 0 images, the surface water extent of the Lena River apparently does not tend to reflect dynamic seasonal change. In contrast, MODIS-based water maps showed continuous seasonal changes characterized by a seasonal increase in the water extent in spring to summer and a seasonal decrease in summer to autumn. The seasonal decrease in surface water extent from July to August was also confirmed in the Landsat-based product, implying consistency between the results obtained from optical sensors.
Sentinel 1 can also easily distinguish frozen river surfaces from liquid water (e.g., on 19 October 2017), whereas MODIS NDWI could not do so. In the predicted water maps, snow masking could not be applied because of the lack of original cloud-free data. Hence, in early May and late October, when the water surface is likely to freeze, the predicted maps in the absence of snow masking may overestimate surface water. Excluding those differences inevitably arising from the sensor features, MODIS and Sentinel 1 created consistent water maps for relatively large-scale water surfaces (21 June 2017).
Our algorithm created wall-to-wall water maps on a daily basis, whereas JRC maps contain no-data pixels even on a monthly basis, probably because of fewer observation opportunities by Landsat, cloud interruption, and error in the scan-line corrector in Landsat 7 (shown as a stripe no-data pattern in May). Columns from left to right: MODIS NDWI maps; Sentinel 1 backscatter coefficient (σ 0 ) maps; MODIS water maps (NDWI > 0.04); Sentinel 1 water maps (σ 0 < −20.5); and the JRC maps. Black and white in the water maps correspond to water and non-water pixels, respectively. Magenta indicates no data. For Sentinel 1, the smoothing filter (i.e., spatial averaging within the 11 × 11 window) was applied to reduce noise.

Discussion
A combination of the random forest and sophisticated (i.e., pix2pix) ML methods created gap-filled MODIS-like water maps with MB = −2.43% and RMSE = 14.7% (excluding the irregular inundation event). Given that the original MODIS dataset was available for only 30.3% of the study period, the substantial improvement in the observation frequency is clearly useful for detailed monitoring of seasonal changes in surface water, particularly during key phenological periods such as foliation and defoliation, river-runoff increase, seasonal permafrost thawing and snow melting, and freezing. The improved temporal frequency enabled wall-to-wall water maps without any no-data pixels to be provided, as compared to the existing JRC water map [16].
Our preliminary experiments revealed that ML prediction of the NDWI maps followed by thresholding created more accurate water maps than those derived solely from direct ML prediction. Similar to previous research reporting that index maps directly predicted by data fusion were more accurate than those calculated from the reflectance predicted by data fusion (i.e., "index-then-blend" was found to be better than "blend-then-index" [61]), we also confirmed the robustness of the index-based approach in terms of different aspects of the data-fusion application. The index-based approach can also provide other options, such as flexible post-tuning of the water threshold, depending upon the location or satellite scenes [62], or relating the index to other hydrological parameters.
The preliminary experiments also revealed that the use of key features in addition to the AMSR2-drived indices improved fusion accuracy. The original 23 input variables closely depend on one another (Table 6), and some are likely to have duplicated information for ML. High VIF does not necessarily mean that the variable in the ML is of lower importance; however, it may be used as an indicator to omit some redundant variables. Principal component analysis is also promising to make the input variables independent from each other. As previous research has reported [28], DOY information clearly helped to describe seasonality ( Figure 6; Table 5), which may be primarily important in characterizing variations in the river and thermokarst lakes associated with the annual cycles of snow (accumulation/melting), water (freezing/melting/ice jamming), and the active permafrost layer (freezing/thawing). However, AMSR2-derived indices (e.g., NDPI and FWS 18.7, V ) are still necessary for tracking interannual variation and sub-seasonal-scale fluctuation, apart from the regular seasonal cycles. Some meteorological parameters, such as soil temperature at 7-28-cm depth (STL2), volumetric soil water at 100-289-cm depth (SWVL4), and total evapotranspiration (E) are also important ancillary data. The soil layers of STL2 and SWVL4 (deeper part of the active layer) may be more important than the others because of the sensitivity of those layers to the hydrological and thermal features of the land surface and the active layer in this region; however, confirming this would require further site-scale investigation.
Contrary to expectations, snowpack in the previous winter and snowmelt in the spring across the basin [40] were not very useful for predictive purposes (Table 5). This is likely due to the spatiotemporal aggregation of those data across very large areas (~2,400,000 km 2 of the entire Lena basin and over several months), suggesting the importance of selecting suitable spatiotemporal scales for input features during the data fusion stage. Furthermore, unlike previous research [28], the total precipitation was found to be a variable of lower importance, which can be partly attributed to the fact that the reference high-quality MODIS data were inevitably collected from clear-sky days, upon which the rainfall and snowfall were unlikely to be observed (the so-called clear-sky bias). High-resolution microwave data (i.e., synthetic-aperture radar (SAR), such as Sentinel 1 and the Phased Array-type L-band Synthetic Aperture Radar (PALSAR) series) can be promising alternatives to optical data, because they are less affected by cloud cover, although there are less historical SAR data than optical data.
Our water maps tended to estimate a greater water fraction than SWAMPS across the study site, particularly during the summer. SWAMPS provides the water fraction at a relatively coarse (25 km) resolution on the basis of unmixing of the passive microwave data (the Special Sensor Microwave/Imager; the Special Sensor Microwave Imager Sounder) [59], and because it does not focus on local mapping in Siberia, it may omit sub-grid-scale surface water and seasonal changes of the Lena River. Our achieved 500 m resolution enabled extraction of not only the Lena River but also thermokarst lakes ( Figure 6) to some extent. The contrasting density of the thermokarst lakes between the right (1-5%) and the left (0-1%) banks of the Lena River [63,64] was consistently depicted by our water maps (e.g., right bank: 2.4%, left bank: 0.1% in the temporal mean water fraction), thus supporting the validity of our water maps.
Given the progress in producing high-resolution water maps [15,16], our water maps should be further downscaled in conjunction with high-resolution data archives (e.g., Landsat series [56]) to extract small-scale surface water data [3], given that the smallscale thermokarst lakes constitute a nonnegligible proportion of the surface water in the region [65]. The omission of more detailed surface water was indeed observed by comparison between our result and Sentinel 1 (Figure 8). The same comparison also revealed that water extraction by MODIS NDWI was more robust than that derived from the simple thresholding of the VV-polarized backscatter signal of C-band SAR. In contrast, freezing and vegetation cover mixed in with the surface water is also likely to affect NDWI, helping to explain the discrepancy in seasonal changes between the MODIS and Sentinel 1 data. Utilizing SAR's ability to penetrate vegetation cover and distinguish liquid water from the frozen surface may improve the data-fusion accuracy in creating water maps.
Expansion of the study period using other historical passive microwave data (e.g., AMSR and AMSR-E) is also important for future work. Watts et al. [8] reported a trend of increasing water fraction across wide areas of the subarctic covered by continuous permafrost, as shown by AMSR-E for the period 2003-2010, whereas our more recent water maps (2013-2018) did not exhibit such a trend according to the Mann-Kendall test, neither in the annual mean nor maximum water fractions across the study site. Creating long data records by expanding our analysis to both the past and the future will provide an in-depth understanding of the surface water dynamics of the region due to climate change.
The sophisticated ML (i.e., pix2pix) showed the potential to create more accurate water maps that include irregular events such as ice-jam inundation in the spring season. Although the setup and tuning of such ML techniques can be laborious [3], they are worth applying to address difficult mapping tasks. Expanding the study period may also contribute to obtaining more matched pairs for the ML training, resulting in a more accurate prediction.

Conclusions
Our research provides a dynamic water map with a daily frequency and a moderate (500 m) spatial resolution encompassing the heterogeneous thermokarst landscapes in eastern Siberia. The combination of the random forest and pix2pix ML methods has allowed MODIS-like water maps to be predicted from the coincident AMSR2 and ancillary maps. Preliminary random forest experiments demonstrated that post-thresholding of the NDWI maps predicted by ML was better than pre-thresholding for directly predicting water maps. We have shown that adding DOY and reanalysis data as ancillary information may improve the ML prediction accuracy, particularly when describing seasonality. The random forest prediction using 10 key variables was further corrected by the conditional GAN (pix2pix), obtaining a relative mean bias of −2.43% and a relative RMSE of 14.7% in the water fraction across the study site. The generated water maps apparently describe regional seasonality better than SWAMPS independent global water-fraction data, and successfully describe the contrasting distributions of thermokarst lakes between banks of the Lena River. Expansion of the analysis using other historical data sources such as AMSR and AMSR-E, and using high-resolution data sources such as the Landsat, Sentinel, and PALSAR series, will provide further opportunities to understand the long-term surface water dynamics across the site resulting from continuing climate change.