Achieving Higher Resolution Lake Area from Remote Sensing Images Through an Unsupervised Deep Learning Super-Resolution Method

Lakes have been identified as an important indicator of climate change and a finer lake area can better reflect the changes. In this paper, we propose an effective unsupervised deep gradient network (UDGN) to generate a higher resolution lake area from remote sensing images. By exploiting the power of deep learning, UDGN models the internal recurrence of information inside the single image and its corresponding gradient map to generate images with higher spatial resolution. The gradient map is derived from the input image to provide important geographical information. Since the training samples are only extracted from the input image, UDGN can adapt to different settings per image. Based on the superior adaptability of the UDGN model, two strategies are proposed for super-resolution (SR) mapping of lakes from multispectral remote sensing images. Finally, Landsat 8 and MODIS (moderate-resolution imaging spectroradiometer) images from two study areas on the Tibetan Plateau in China were used to evaluate the performance of UDGN. Compared with four unsupervised SR methods, UDGN obtained the best SR results as well as lake extraction results in terms of both quantitative and visual aspects. The experiments prove that our approach provides a promising way to break through the limitations of median-low resolution remote sensing images in lake change monitoring, and ultimately support finer lake applications.


Introduction
Lakes are dynamic systems that support enormous biodiversity and provide key provisioning and cultural ecosystem services to people around the world [1]. Since the changes in lakes, such as expansion and shrinkage are closely related to the effect of climate and human activities [2], lakes can act as a salient indicator of environmental change. In recent decades, accelerated climate warming and (2) A new unsupervised SR model UDGN is proposed based on a deep residual network in this paper. It does not require pretraining and can be adapted to different settings of images, such as different image sizes and channels. (3) The features of the gradient map are extracted and fused in the network to provide more geographic details in HR images. (4) We verify the effectiveness of our method with two data sets, the results demonstrate the superiority of our method in improving the spatial resolution of lake area extraction.

Study Area and Data
Two study areas are selected from the Tibetan Plateau (TP), as shown in Figure 1. The TP is the largest and highest plateau in the world, and there are numerous lakes distributed throughout the TP [4]. Along with the Arctic region and Antarctica, the Tibetan Plateau (TP) and the Mongolian Plateau (MP) are among the world's most sensitive regions to climate change [28]. Hence, monitoring accurately the changes of lakes in the TP is of great importance for climate change research. In this work, two typical areas located in the TP, China are chosen to evaluate the effectiveness and practicability of our method. Study area 1 has a large tectonic lake (Yamzho Yumco) with irregular edges, which is used to verify the superiority of our proposed SR method when dealing with lakes with complex terrain and intricate structures. In terms of study area 2, there are many small lakes such as Sangzhen and other no-name lakes (<10 km 2 ) which are difficult to be recognized in low-resolution RS images. Study area 2 is used to test the SR performance of our method for very small lakes on real low-resolution RS images. The Landsat-8 OLI, MODIS, and Sentinel 2 images are acquired from google earth engine (https://earthengine.google.com). Detailed information of the two areas is summarized in Table 1. Locations of the study areas are shown in color (R5G4B3) Landsat images in Figures 1b and c respectively. The lake images shown in Figure 1d,e are derived from the corresponding Landsat images at 30 m resolution using the NDWI method.  The Landsat-8 OLI, MODIS, and Sentinel 2 images are acquired from google earth engine (https://earthengine.google.com). Detailed information of the two areas is summarized in Table 1. Locations of the study areas are shown in color (R5G4B3) Landsat images in Figure 1b,c respectively. The lake images shown in Figure 1d,e are derived from the corresponding Landsat images at 30 m resolution using the NDWI method.  Figure 2 shows the flowchart of the proposed method for generating finer-scale lakes from RS images, the core idea is to break the spatial resolution limitations of the original RS images by introducing a super-resolution method. There are two main components of the whole workflow: lake area extraction (LAE) and image SR. Two strategies using the unsupervised deep gradient network (UDGN) model are proposed to improve the spatial resolution of the lake area.

Unsupervised Super-Resolution Mechanism
Compared to supervised learning, unsupervised learning does not require paired LR-HR images for training. In this paper, inspired by [21], a new unsupervised deep learning-based SR method UDGN is proposed.
In the proposed method, unsupervised learning is based on the hypothesis that the repeated occurrence of small image patches across scales of a single image is a very strong property of natural images [33,34]. It is the same for the RS images. As shown in Figure 3, the mountain and lake patches are shown to repeat many times inside the whole image. Therefore, our method relies only on the input LR image and exploits image-specific information to generate a super-resolved image. In LAE, the normalized difference water index (NDWI) is adopted to automatically separate water and non-water features. The NDWI has been widely used for water and lake body classification from RS images [30][31][32]. It is calculated according to the following equation: Remote Sens. 2020, 12, 1937 6 of 20 where Green denotes the green band and NIR is the near-infrared band, and the range of the color slice with 0-1 is chosen to extract the water body boundary according to [4]. The SR method aims to improve the spatial resolutions of each input LR image. The UDGN model mainly consists of four parts: feature fusion, deep feature extraction, upsampling, and reconstruction. The first part fuses the features of image gradient information and original LR images; the second part aims to extract more complex and deep features from the fused features; the third part devotes to improve the spatial resolution, and the last part finally generates the HR image.
Two strategies are proposed in this paper to improve the spatial resolution of the lake area ( Figure 2). Strategy 1 is to generate the NDWI map first, then the SR model is applied to get the NDWI map with higher spatial resolution. Finally, the lake area is extracted through the threshold. Strategy 2 is to super-resolve the original multispectral RS images via the SR model, and the NDWI map is subsequently calculated from the reconstructed HR image to identify the lake areas. The biggest difference between the two strategies is the input content of the SR model. Strategy 1 takes the NDWI product as input while strategy 2 takes the original RS image. Both two strategies construct an end-to-end high spatial resolution LAE procedure to provide better support for finer lake monitoring.
In reality, we usually encounter many challenges such as lack of sufficient RS images, and it is difficult to use existing products to obtain better spatial resolution accurately. UDGN is an unsupervised learning method, which can adapt to different image sizes, channel numbers, and types. With the properties of the UDGN method, we do not require amounts of paired images for training, thus we can directly improve the spatial resolution of products such as the NDWI maps.

Unsupervised Super-Resolution Mechanism
Compared to supervised learning, unsupervised learning does not require paired LR-HR images for training. In this paper, inspired by [21], a new unsupervised deep learning-based SR method UDGN is proposed.
In the proposed method, unsupervised learning is based on the hypothesis that the repeated occurrence of small image patches across scales of a single image is a very strong property of natural images [33,34]. It is the same for the RS images. As shown in Figure 3, the mountain and lake patches are shown to repeat many times inside the whole image. Therefore, our method relies only on the input LR image and exploits image-specific information to generate a super-resolved image.

Unsupervised Super-Resolution Mechanism
Compared to supervised learning, unsupervised learning does not require paired LR-HR images for training. In this paper, inspired by [21], a new unsupervised deep learning-based SR method UDGN is proposed.
In the proposed method, unsupervised learning is based on the hypothesis that the repeated occurrence of small image patches across scales of a single image is a very strong property of natural images [33,34]. It is the same for the RS images. As shown in Figure 3, the mountain and lake patches are shown to repeat many times inside the whole image. Therefore, our method relies only on the input LR image and exploits image-specific information to generate a super-resolved image. Specifically, the unsupervised learning mechanism is shown in Figure 4. Each test image is down-sampled first to obtain the lower-resolution image . Then, corresponding image patches Specifically, the unsupervised learning mechanism is shown in Figure 4. Each test image I test is down-sampled first to obtain the lower-resolution image I LR . Then, corresponding image patches derived from I LR and I test are collected as samples to train the UDGN model. Last, the trained model is applied to the test image to produce the HR image I HR .
derived from and are collected as samples to train the UDGN model. Last, the trained model is applied to the test image to produce the HR image . Because the learning of the model focused on a single image, it can avoid interference from other image features, image quality, noise, etc. In addition, the model can learn features more precisely and specifically. Furthermore, since our model does not require pre-training, it can adapt itself to different settings per image, such as different image sizes and different numbers of input channels. This allows us to perform SR of RS images and intermediate products (e.g., the NDWI image in this paper). The test image is firstly down-sampled to lower resolutions, and then the UDGN is trained to recover the test image from its low-resolution (LR) versions. Finally, the resulting self-supervised network is applied to the test image to produce highresolution (HR) images.

The Structure of the UDGN Model
The UDGN model aims to learn the cross-scale internal recurrence of image-specific information and use this information to improve the spatial resolution of each test image. Finally, extracting HR lakes from the super-resolved images. The architecture of the UDGN model is shown in Figure 5. The network consists of convolution (Conv) layers, rectified linear unit (ReLU) layers, fusion layer (Fusion), element-wise-sum layers, pixel-shuffle layers, and several residual blocks (ResBlock). Conv layers are to extract low-level features and the ReLU layer is taken as an activation function for nonlinear mapping. The fusion layer is to concatenate the feature maps for feature fusion. The pixelshuffle layer is to transform the feature maps into the size as desired for HR output.
In the feature fusion part, the gradient map of the LR image is obtained with Sobel Operator [35] first. The gradient map is important in boundary detection because images often change most quickly at the boundary between objects (Jacobs, 2005), and this information is important for lakes extraction. Then, two simple CNNs are built to preliminarily extract shallow features of both the LR image and its gradient map. The LR image can provide much low-frequency information. By fusing with highfrequency information contained in the gradient map, the integrated feature maps can retain more comprehensive details in a super-resolved HR image. The test image is firstly down-sampled to lower resolutions, and then the UDGN is trained to recover the test image from its low-resolution (LR) versions. Finally, the resulting self-supervised network is applied to the test image to produce high-resolution (HR) images.
Because the learning of the model focused on a single image, it can avoid interference from other image features, image quality, noise, etc. In addition, the model can learn features more precisely and specifically. Furthermore, since our model does not require pre-training, it can adapt itself to different settings per image, such as different image sizes and different numbers of input channels. This allows us to perform SR of RS images and intermediate products (e.g., the NDWI image in this paper).

The Structure of the UDGN Model
The UDGN model aims to learn the cross-scale internal recurrence of image-specific information and use this information to improve the spatial resolution of each test image. Finally, extracting HR lakes from the super-resolved images. The architecture of the UDGN model is shown in Figure 5. The network consists of convolution (Conv) layers, rectified linear unit (ReLU) layers, fusion layer (Fusion), element-wise-sum layers, pixel-shuffle layers, and several residual blocks (ResBlock). Conv layers are to extract low-level features and the ReLU layer is taken as an activation function for nonlinear mapping. The fusion layer is to concatenate the feature maps for feature fusion. The pixel-shuffle layer is to transform the feature maps into the size as desired for HR output.
In the feature fusion part, the gradient map of the LR image is obtained with Sobel Operator [35] first. The gradient map is important in boundary detection because images often change most quickly at the boundary between objects (Jacobs, 2005), and this information is important for lakes extraction. Then, two simple CNNs are built to preliminarily extract shallow features of both the LR image and its gradient map. The LR image can provide much low-frequency information. By fusing with high-frequency information contained in the gradient map, the integrated feature maps can retain more comprehensive details in a super-resolved HR image.
layers and a ReLU layer.
In the upsampling part, several pixel-shuffle layers are used to improve image size gradually. The detailed description of this kind of layer can be found in [36]. Each pixel-shuffle layer upscales two times.
In the reconstruction part, the original input LR image is interpolated to the HR size to provide In summary, the proposed network has three main characteristics: I. Deep: it can efficiently extract deep features and complete multi-spectral RS-SR tasks. II. Geographic information preservation: By the fusion of the gradient information to enhance the original image, more geoinformation such as terrain and texture can be preserved, which provides a good foundation for subsequent LAE.
III. Adaptive: it can super-resolve RS images/products of different image sizes and channels.

Evaluation Criteria
To evaluate the performance of the proposed method quantitatively, we adopt two groups of criteria, one group for SR performance evaluation and the other group for LAE accuracy evaluation.
Peak signal-to-noise ratio (PSNR) [37], structural similarity index (SSIM) [38], the normalized root mean square error (NRMSE) [25], and the spectral angle mapper (SAM) [39] are used to evaluate the SR performance. PSNR is measured in decibels (dB). The larger the PSNR and SSIM, the better the SR performance. The smaller the values of NRMSE and SAM, the better the SR effect.
In terms of LAE accuracy assessment, overall accuracy (OA), Kappa coefficient (KC), average producer's accuracy (APA), and average user's accuracy (AUA) are utilized. These criteria have been used in many types of research, such as water body extraction [40], urban flooding mapping [41], and wetland inundation mapping [9]. A higher value of these criteria indicates the method is of higher quality. In the deep feature extraction part, several ResBlocks are devoted to extract high-level features and learn the complex mapping between LR and HR images. Each ResBlock consists of two Conv layers and a ReLU layer.
In the upsampling part, several pixel-shuffle layers are used to improve image size gradually. The detailed description of this kind of layer can be found in [36]. Each pixel-shuffle layer upscales two times.
In the reconstruction part, the original input LR image is interpolated to the HR size to provide global low-frequency information. By integrating the interpolated image and the HR residual, the HR image is finally obtained.
In summary, the proposed network has three main characteristics: I. Deep: it can efficiently extract deep features and complete multi-spectral RS-SR tasks. II. Geographic information preservation: By the fusion of the gradient information to enhance the original image, more geoinformation such as terrain and texture can be preserved, which provides a good foundation for subsequent LAE. III. Adaptive: it can super-resolve RS images/products of different image sizes and channels.

Evaluation Criteria
To evaluate the performance of the proposed method quantitatively, we adopt two groups of criteria, one group for SR performance evaluation and the other group for LAE accuracy evaluation.
Peak signal-to-noise ratio (PSNR) [37], structural similarity index (SSIM) [38], the normalized root mean square error (NRMSE) [25], and the spectral angle mapper (SAM) [39] are used to evaluate the SR performance. PSNR is measured in decibels (dB). The larger the PSNR and SSIM, the better the SR performance. The smaller the values of NRMSE and SAM, the better the SR effect.
In terms of LAE accuracy assessment, overall accuracy (OA), Kappa coefficient (KC), average producer's accuracy (APA), and average user's accuracy (AUA) are utilized. These criteria have been used in many types of research, such as water body extraction [40], urban flooding mapping [41], and wetland inundation mapping [9]. A higher value of these criteria indicates the method is of higher quality.

Architecture Details of UDGN
When the upscale factor is 4, the specific settings of the components of UDGN are listed in Table 2. The kernel size of each Conv layer is 3 × 3. During the training phase, the loss function is an L1 loss, and the optimization algorithm is Adam. The learning rate is set to 0.001, and multiplied by 0.1 after 60 epochs.  [21]. Specifically, at each iteration, we take a random crop of fixed size from a randomly-selected example pair. The crop size should be smaller than the size of the input image. In this paper, the crop size is typically set to 256×256. In addition, we use augmentation methods to generate more training examples to fully train the model. The augmentation methods include flipping, rotating, and panning the image randomly.

Results of Two Strategies
To compare the performance of the two strategies for producing a higher resolution lake area, we test our method on Landsat 8 dataset of study area 1. For strategy 1, the NDWI map is used as the input of the SR model, while for strategy 2, the original RGB RS image is used as the input. Band 5 and Band 3 are the NIR band and Green band, respectively, which are used to calculate the NDWI map as formulated in Equation (1). The LAE results are shown in Table 3, and the visual results when the upscale factor is 4 is shown in Figure 6. In addition, Figure 7 further demonstrates the difference between the SR image and the ground-truth image. From the global aspect, Table 3 indicates that the performance of a 3 channel input is better than 1 channel. For instance, when the upscale factor is 8, the Kappa of strategy 2 is 0.9718, while that of strategy 1 is 0.9372. In addition, except for the AUA values, the OA, APA, and Kappa results of strategy 2 are all larger than strategy 1. In terms of the visual results, it is apparent that strategy 1 tends to obtain lake areas with more noises around the edges. As we can see from Figure 7b, there are many green pixels around the lakes, indicating that many land pixels are wrongly classified as lakes. In addition, using strategy 1, some small rivers (in purple circles) are super-resolved to be much bigger than the actual. In contrast, strategy 2 is able to achieve HR images with sharper and clearer lake structures and the noises are much less than strategy 1. The reasons for the different effects of the two strategies are as follows: first, in the NDWI calculating process, much important information is lost such as other land cover feature types, different spectral information, and topographic information around the lake. In strategy 1, the NDWI image is generated firstly and then the super-resolution is conducted directly on the NDWI image.  can be well preserved in the HR image. In this way, strategy 2 is able to further improve the LAE accuracy.
In the real world, it depends on the specific demand to choose the proper strategy. For example, when the acquisition, splicing, and fusion process of original RS images is difficult. Researches can use strategy 1 to directly improve the spatial resolution of the previously generated product (e.g, lake/NDWI map). When the original image is readily available, it is recommended to use strategy 2 to obtain the lake more accurately. In addition, the fewer image channels that are input to the SR model, the less computing resources and memory they consume.   The reasons for the different effects of the two strategies are as follows: first, in the NDWI calculating process, much important information is lost such as other land cover feature types, different spectral information, and topographic information around the lake. In strategy 1, the NDWI image is generated firstly and then the super-resolution is conducted directly on the NDWI image. Edge noise is easy to occur during the super-resolution since there is no more neighboring topographic and spectral information to provide as constraints. On the contrary, the calculating of NDWI is the last step of strategy 2, which can avoid the problems faced with strategy 1. In addition, using the original multi-band image as the input of the SR method, the obtained gradient map can better reflect the real terrain condition and the relative values of the Green band and the NIR band can be well preserved in the HR image. In this way, strategy 2 is able to further improve the LAE accuracy.
In the real world, it depends on the specific demand to choose the proper strategy. For example, when the acquisition, splicing, and fusion process of original RS images is difficult. Researches can use strategy 1 to directly improve the spatial resolution of the previously generated product (e.g, lake/NDWI map). When the original image is readily available, it is recommended to use strategy 2 to obtain the lake more accurately. In addition, the fewer image channels that are input to the SR model, the less computing resources and memory they consume.

Comparison with Different SR Methods
In this section, our method is tested on the Landsat 8 dataset of study area 1. The process of all the methods is consistent with strategy 1. Firstly, the NDWI map is obtained using band 5 (NIR band) and band3 (Green band) of the Landsat 8 image. The experiments are carried out with three different upscale factors, i.e., 2, 4, and 8. Since there is no real LR-HR paired data, the original NDWI image is down-sampled using the BCI algorithm with corresponding factors to obtain LR images of different scales.
In addition, to verify the effectiveness and superiority of our method, different types of unsupervised SR methods including traditional method (i.e., BCI), machine learning methods (i.e., IBP [23] and TSR [24]) and deep learning method (i.e., ZSSR [24]) and a supervised SR method super-resolution convolutional neural network (SRCNN) [42] are used to compare the SR performance as well as the accuracy of LAE. All the methods are used considering the default settings suggested by the authors. IBP and TSR are implemented in MATLAB while others are implemented in Python. The detailed results of different upscale factors and different methods are shown in Table 4. From a global perspective, SR with larger upscale factors has worse results. Furthermore, as we can see from Table 4, our proposed method achieves the best results on all the evaluation criteria. In terms of the SR performance, the calculated PSNR, SSIM, NRMSE, and SAM results illustrate that our method can reconstruct information from LR images better than other methods. For example, when the upscale factor is 4, the PSNR and SSIM of our method are 35.1123 dB and 0.9726, respectively, while the values of other methods are smaller than 34.6dB and 0.965, especially the BCI results, which are the worst (BCI: 29.4002dB, 0.9481).
Compared with the supervised method SRCNN, unsupervised methods are superior in image super-resolution. As we can see that when the upscale factor is 2, the PSNR and SSIM results of SRCNN are smaller than TSR. When the upscale factor is 8, the PSNR, NRMSE, and SAM results of SRCNN are even worse than BCI. This may be related to the lack of sufficient training samples. In addition, since the image size of the training samples used for supervised learning must be the same, it is necessary to cut the test image of a larger size into several small patches for super-resolution. This will affect the SR and LAE results due to a lack of global information.
Comparing different types of unsupervised methods, deep learning methods (i.e., ZSSR and UDGN) are superior to BCI and machine learning methods. For example, when the upscale factor is 2, the PSNR values of deep learning methods are larger than 39dB, while PSNR values of BCI and the best machine learning method (TSR) is 33.4083dB and 37.2856dB, respectively. This is because interpolation methods do not consider the prior information of the LR images and handcrafted prior features used in machine learning methods are not sufficiently competent for the SR task.
Furthermore, the highest OA, AUA, APA, and Kappa values verify that our method has a strong ability in preserving the lake structure and edges accurately in the HR images, and further improve the spatial resolution of lakes. For example, when the upscale factor is 8, the OA values of BCI, IBP, TSR, ZSSR, and UDGN are 0.9562, 0.9284, 0.9390, 0.9386, and 0.9809, respectively.
In addition to the quantitative assessments, the visual results ( Figure 8) when the upscale factor is 8 are provided for a qualitative and intuitive evaluation of SR performance. Focusing on Figure 8c, the images obtained from BCI are the most blurred. This is because BCI relies heavily on the values of neighboring pixels, while other important prior information such as textures are ignored. In addition, the images super-resolved through IBP and ZSSR have obvious shadows around the edges, especially the IBP results. The existence of these edge shadows will lead to misclassification of lake margins (i.e., more land areas are classified as lake area). As for TSR results, there are regular pyramid shapes in some areas. This is related to the fact that TSR builds the internal LR-HR patch database using the scale-space pyramid of the image. These pyramid shapes, which may cross land and lakes and add more noises, will largely affect the LAE accuracy. Hence, although ZSSR and TSR can generate much sharper NDWI images than BCI, the OA values are smaller than BCI (Table 4).  The proposed UDGN is able to get high-resolution images without adding more noise. Using the deep CNN architecture with global and local residual blocks, more deep features and high-frequency information can be captured to improve the SR performance. Furthermore, by fusion of the features extracted from both the gradient map and the original test image, more geographic information details such as terrain and lake edges can remain in HR images, thus obtaining the lake area more precisely. As we can see from Figure 8h, the lake edges are sharper than other methods, and the details on the small corners are closer to reality.
As a whole, by fusing the important gradient information and learning the deep internal features of the given NDWI image, our method can significantly improve the spatial resolution of lakes, which is very important for further analysis and practical applications.

Results of Lake Extraction from MODIS Data
In this section, we verify the effectiveness of our UDGN model in real-world scenarios. The MODIS data at study area 2 is used as the experimental dataset. Since strategy 2 outperforms strategy 1 when there are original RS images, we use the multispectral MODIS image (band 2, band 1, band 4) as the input of the UDGN model to obtain the desired high-resolution lake area. Specifically, the MODIS image with a spatial resolution of 500m is improved to 30m using the UDGN model. Then, the NDWI is calculated and marks the area where NDWI values are larger than 0, as the lake area. Figure 9 shows the color images, LAE results, and provides the close-ups of some typical lakes including Pongyin Co, Timachaka, and Noname Lake. The first column presents the MODIS data, and the second column shows the predicted HR data. Landsat 8 data is displayed in the third column to reflect the actual conditions. Moreover, for the quantitative assessment of the SR performance, we roughly estimate the area of the selected three lakes by calculating the number of pixels. The results are shown in Table 5, and the estimated results are compared to the data of 2014 provided in [29].

Results of Lake Extraction from MODIS Data
In this section, we verify the effectiveness of our UDGN model in real-world scenarios. The MODIS data at study area 2 is used as the experimental dataset. Since strategy 2 outperforms strategy 1 when there are original RS images, we use the multispectral MODIS image (band 2, band 1, band 4) as the input of the UDGN model to obtain the desired high-resolution lake area. Specifically, the MODIS image with a spatial resolution of 500m is improved to 30m using the UDGN model. Then, the NDWI is calculated and marks the area where NDWI values are larger than 0, as the lake area. Figure 9 shows the color images, LAE results, and provides the close-ups of some typical lakes including Pongyin Co, Timachaka, and Noname Lake. The first column presents the MODIS data, and the second column shows the predicted HR data. Landsat 8 data is displayed in the third column    As shown in Figure 9, the lakes derived from 500m resolution MODIS data have obvious serrated boundaries because each pixel covers a large area. It is hard to depict the actual shape of small lakes through limited pixels. By using our proposed UDGN model, more detailed structure and edge information are reconstructed, thereby, the shape of the lake is more realistic and the boundaries are smoother. In addition, the estimated area of predicted HR image is much closer to the reference data, regardless of large lakes (>10 km 2 ) or small lakes (<10 km 2 , <5 km 2 ). For instance, the area of Timachaka calculated from MODIS and predicted HR data are 6.75 km 2 and 7.5051 km 2 , respectively, and the reference is 7.439273 km 2 . The results derived from the super-resolved image are much more accurate.
In addition, it is worth noting that our proposed method is able to discover small lakes from MODIS data. As illustrated in Figure 9, Noname Lake (Area: 1.104846 km 2 ) is missing in the lake map derived from MODIS data, while the lake can be successfully extracted by improving the resolution of the image based on the UDGN model. It is of great significance in small lakes monitoring and further climate change analysis. Although there may exist differences between the predicted HR image and Landsat 8 image, it makes a big step forward to discover small lakes that cannot be figured out from the original low-resolution RS images.

Discussion
SR techniques can help generate a finer lake area from RS images. In this part, the practicality of the proposed method is further analyzed. In our method, the self-learning process is done at test time. To better show the training process, the experiment is conducted taking the NDWI image of study area 1 as an example, the settings of this experiment are the same as Section 3.3. Figure 10 shows the PSNR values versus iterations when the upscale factor is 4. It can be seen that the PSNR values basically converge after about 2000 iterations. In addition, the runtime of UDGN is important in practice, and it is related to image size, upscale factors, etc. The runtime to upscale a single image of 128 × 128, 256 × 256, and 512 × 512 by 2 times is about 47, 58, and 67 seconds, respectively (on a GeForce GTX 1080 GPU). In our method, the self-learning process is done at test time. To better show the training process, the experiment is conducted taking the NDWI image of study area 1 as an example, the settings of this experiment are the same as Section 3.3. Figure 10 shows the PSNR values versus iterations when the upscale factor is 4. It can be seen that the PSNR values basically converge after about 2000 i Figure 10. The Peak signal-to-noise ratio (PSNR) values versus iterations when the upscale factor is 4.
In the real world, researchers prefer to use the best available datasets such as 10 m Sentinel 2 data and mainly apply SR to the lower resolution images in order to produce a more coherent time series. Therefore, we use the UDGN model to super-resolve MODIS data (spatial resolution: 500m) In the real world, researchers prefer to use the best available datasets such as 10 m Sentinel 2 data and mainly apply SR to the lower resolution images in order to produce a more coherent time series. Therefore, we use the UDGN model to super-resolve MODIS data (spatial resolution: 500m) and Landsat 8 data (spatial resolution: 30m) to 10m, respectively. Then, the lakes extracted from the SR images are compared with lakes extracted from a 10 m Sentinel 2 image from the same period. The results are shown in Figure 11. We can see that the lakes extracted from predicted HR data have sharper edges and details. The area of the Kongkong Caka in the original MODIS, Landsat 8, and Sentinel 2 data are 57.00 km 2 , 60.7230 km 2 , and 60.4132 km 2 , respectively. Taking the Sentinel 2 data as ground-truth, the area of the Kongkong Caka extracted from the predicted HR data is closer to the ground-truth. This illustrates that the proposed method has strong practicability and can help to improve the spatial resolution of RS images and generate finer lakes. and Landsat 8 data (spatial resolution: 30m) to 10m, respectively. Then, the lakes extracted from the SR images are compared with lakes extracted from a 10 m Sentinel 2 image from the same period.
The results are shown in Figure 11. We can see that the lakes extracted from predicted HR data have s

Conclusions
Lake monitoring is very important for environmental and climate change studies. RS is widely used in this task, however, due to the constraints of the spatial resolution, most studies focused on large lakes, while considering small lakes is much more challenging. Therefore, in this study, we

Conclusions
Lake monitoring is very important for environmental and climate change studies. RS is widely used in this task, however, due to the constraints of the spatial resolution, most studies focused on large lakes, while considering small lakes is much more challenging. Therefore, in this study, we propose a new deep learning-based method UDGN for super-resolution mapping of lakes from multispectral RS images. Unsupervised learning mechanism is exploited, which does not require a large amount of LR-HR paired samples for training. For each test image, the gradient map is obtained to retain more detailed geographical information such as edges and structures. Then, an image-specific residual network is trained at test time to improve the spatial resolution of each test image. As such, the UDGN model can deal with different image sizes, different image channels, and different upscale factors. Based on all the above, the Landsat 8 OLI and MODIS images from two study areas from the Tibetan Plateau in China are used as experimental data. Our method is applied to improve the images with different upscale factors. The results show that our method outperforms four approaches (i.e., BCI, IBP, TSR, and ZSSR) from both visual and quantitative perspectives. It is worth noting that our method is able to discover invisible small lakes from MODIS data, which provides a way to break through the spatial resolution of RS images and better support lake studies.